Understanding how Python "Compiles" or "Interprets" Function Objects

Understanding how Python "Compiles" or "Interprets" Function Objects - python

I have read the following posts but I am still unsure of something.
Python Compilation/Interpretation Process
Why python compile the source to bytecode before interpreting?
If I have a single Python file myfunctions.py containing the following code.
x = 3
def f():
print x
x = 2
Then, saying $ python myfunctions.py runs perfectly fine.
But now make one small change to the above file. The new file looks as shown below.
x = 3
def f():
print x
x = 2
f() # there is a function call now
This time, the code gives out an error. Now, I am trying to understand this behavior. And so far, these are my conclusions.
Python creates bytecode for x=3
It creates a function object f, quickly scans and has bytecode which talks about the local variables within f's scope but note that the bytecode for all statements in Python are unlikely to have been constructed.
Now, Python encounters a function call, it knows this function call is legitimate because the bare minimum bytecode talking about the function object f and its local variables is present.
Now the interpreter takes the charge of executing the bytecode but from the initial footprint it knows x is a local variable here and says - "Why are you printing before you assign?"
Can someone please comment on this? Thanks in advance. And sorry if this has been addressed before.

When the interpreter reads a function, for each "name" (variable) it encounters, the interpreter decides if that name is local or non-local. The criteria that is uses is pretty simple ... Is there an assignment statement anywhere in the body to that name (barring global statements)? e.g.:
def foo():
x = 3 # interpreter will tag `x` as a local variable since we assign to it here.
If there is an assignment statement to that name, then the name is tagged as "local", otherwise, it gets tagged as non-local.
Now, in your case, you try to print a variable which was tagged as local, but you do so before you've actually reached the critical assignment statement. Python looks for a local name, but doesn't find it so it raises the UnboundLocalError.
Python is very dynamic and allows you to do lots of crazy things which is part of what makes it so powerful. The downside of this is that it becomes very difficult to check for these errors unless you actually run the function -- In fact, python has made the decision to not check anything other than syntax until the function is run. This explains why you never see the exception until you actually call your function.
If you want python to tag the variable as global, you can do so with an explicit global1 statement:
x = 3
def foo():
global x
print x
x = 2
foo() # prints 3
print x # prints 2
1python3.x takes this concept even further an introduces the nonlocal keyword

mgilson got half of the answer.
The other half is that Python doesn't go looking for errors beyond syntax errors in functions (or function objects) it is not about to execute. So in the first case, since f() doesn't get called, the order-of-operations error isn't checked for.
In this respect, it is not like C and C++, which require everything to be fully declared up front. It's kind of like C++ templates, where errors in template code might not be found until the code is actually instantiated.

Related

A variable is in the scope of only some of thees python nested function definitions but not all, why?

In the code below:
def f():
a = 'x'
def g():
print(a)
if a == 'x':
return True
return False
def h():
print(a)
def i():
a = a + a
return a
a = i()
return a
if g():
return h()
Why is a accessible in function g, but not in function h or i?
I don't want to use nonlocal since I don't want to modify a in any of the inner functions, however I don't see why a itself is not accessible.

Short answer: because you assigned to a (by writing a = a + a and a = i()), you created local variables. The fact that you use variables before assignment does not matter.
Python checks the scope by checking assignments. If you somewhere write an assignment like a =, a +=, etc. regardless where you write it in the function, the function sees a as a local scope variable.
So in case you write:
a = 2
def f():
print(a)
a = 3
Even if you access a before you assign to a, it will still see a as a local variable. Python does not do codepath analysis here.
it sees a a a local variable in f. It will error if you call f(), since it will say you fetch a before it is actually assigned.
In case a variable is not defined locally, Python will iteratively inspect the outer scopes until it finds an a.
The only ways to access a variable from an outer scope if you assign to in a scope is by working with nonlocal or global (or of course passing it as a parameter).

The other answer is great for explaining what's going wrong. I'm adding my own answer to try to explain some of the reasons behind the issue (i.e. the "why" rather the "what").
First you need to understand Python's architecture a little bit. We often describe Python as an "interpreted" language rather than a "compiled" language like C, but that's not really the whole story. While Python doesn't compile directly to machine code, the interpreter doesn't run on the raw source code when the program is running. Rather, there's an intermediate step where the source code is compiled to to bytecode. The compiling happens automatically when a module is loaded, so you may not even be aware of it (though you may have seen the .pyc files that the compiler writes to cache the bytecode).
Anyway, to get back to your scope issue: Python's compiler uses a bytecode instruction to tell the interpreter to access a local variable rather than it uses for accessing a global variable (and third different instruction is sued for to accessing a variable from an enclosing function's scope). Since the bytecode is written by the compiler, the bytecode instruction to use needs to be decided at a function's compile time, not when the function is called. The choice of instruction is tricky though for ambiguous code like this:
a = 1
def foo():
if bar():
a = 2
print(a)
Does the access of a for the print call use the bytecode instruction that reads the global variable a or the instruction that accesses the local variable a? There's no way for the compiler to know in advance if bar will return a true value or not, so there's no possible answer that will let the function work in all situations.
To avoid ambiguity, Python's designers chose that the scope of a variable should be constant throughout each function (so the compiler can just pick one bytecode instruction and stick with it). That is, a name like a can refer to local or a global (or a closure cell) but only one of those in any given function.
The compiler defaults to using local variables (which are the fastest to access) for any name used as the target of an assignment anywhere in the function's code. Since inner functions are compiled at the same time as the functions that contain them, non-local lookups can also be detected at compile time and the appropriate instruction used. If the name isn't found in either the local or the enclosing scopes, the compiler assumes it is a global variable (which doesn't need to be defined yet). The global and nonlocal statements allow you to explicitly tell the compiler to use a specific scope (overriding what it would pick on its own).
You can explore the different ways the compiler handles variable lookups in different scopes using the dis module from the standard library. The dis module disassembles bytecode into a more readable format. Try calling dis.dis on functions like these:
a = 1
def load_global():
print(a) # access the global variable "a"
def load_fast():
a = 2
print(a) # access the local variable "a", which shadows the global variable
def closure():
a = 2
def load_dref():
print(a) # access the variable "a" from the enclosing scope
return load_dref
load_dref = closure() # both dis.dis(closure) and dis.dis(load_dref) are interesting
The full details of how to interpret the output of dis.dis are beyond the scope (no pun intended) of this answer, but the main things to look for are the LOAD_... bytecode instructions that deal with (a) as their target. You'll see three different LOAD_... instructions in the different functions above, corresponding to the three different kinds of scopes they're reading from (each function is named for the corresponding instruction).

How does python implement lookups for mutually recursive function calls?

Suppose I have two functions one after another in the same python file:
def A(n):
B(n-1)
# if I add A(1) here, it gives me an error
def B(n):
if n <= 0:
return
else:
A(n-1)
When the interpreter is reading A, B is not yet defined, however this code does not give me an error. This confuses me, because I thought that Python programs are interpreted line by line. How come the attempt to call B within A doesn't give an error immediately, before anything is called?
My understanding is that, when def is interpreted, Python adds an entry to some local name space locals() with {"function name": function address}, but as for the function body, it only does a syntax check:
def A():
# this will give an error as it isn't a valid expression
syntax error
def B():
# even though x is not defined, this does not give an error
print(x)
# same as above, NameError is only detected during runtime
A()
Do I have it right?

The line B(n-1) says "when this statement is executed, lookup some function B in the module scope, then call it with parameters n-1". Since the lookup happens when the function is executed, B can be defined later.
(Additionally, you can completely overwrite B with a different function, and A will call the new B afterwards. But that can lead to some confusing code.)
If you're worried about not catching calls to nonexistent functions, you can try using static analysis tools. Other than that, be sure you're testing your code.

A SyntaxError will be caught at compile time, but most other errors (NameError, ValueError, etc.) will be caught only at runtime, and then only if that function is called.
"if I have written a function, if its not called in my test.." - and that is why you should test everything.
Some IDEs will raise warnings in various situations, but the best option is still to conduct thorough testing yourself. This way, you can also check for errors that arise through factors like user input, which an IDE's automated checks won't cover.

When the interpreter is reading A, B is not yet defined, however this code does not give me an error
The reason why python interpreter doesn't give an error can be found from docs, which is called forward reference technically:
Name resolution of free variables occurs at runtime, not at compile time.

Considering the first example code specifically:
In Python, def A(): is an executable statement. It is evaluated by using the code inside the block to create a function object, and then assigning that object to the name A. This does more than just a "syntax check"; it actually compiles the code (in the reference implementation, this means it produces bytecode for the Python VM). That process involves determining that B i) is a name, and ii) will be looked up globally (because there is no assignment to B within A).
We can see the result using the dis module from the standard library:
>>> def A(n):
... B(n-1)
...
>>> import dis
>>> dis.dis(A)
2 0 LOAD_GLOBAL 0 (B)
2 LOAD_FAST 0 (n)
4 LOAD_CONST 1 (1)
6 BINARY_SUBTRACT
8 CALL_FUNCTION 1
10 POP_TOP
12 LOAD_CONST 0 (None)
14 RETURN_VALUE
this result is version-specific and implementation dependent; I have shown what I get using the reference implementation of Python 3.8. As you can infer, the LOAD_GLOBAL opcode represents the instruction to look for B in the global namespace (and failing that, in the special built-in names).
However, none of the code actually runs until the function is called. So it does not matter that B isn't defined in the global namespace yet; it will be by the time it's needed.
From comments:
but why python does not check for NameError at compile time?
Because there is nothing sensible to check the names against. Python is a dynamic language; the entire point is that you have the freedom to Do Whatever It Takes to ensure that B is defined before A uses it. Including, for example, extremely bad ideas like downloading another Python file from the Internet and dynamically executing it in the current global namespace.

Update locals from inside a function

I would like to write a function which receives a local namespace dictionary and update it. Something like this:
def UpdateLocals(local_dict):
d = {'a':10, 'b':20, 'c':30}
local_dict.update(d)
When I call this function from the interactive python shell it works all right, like this:
a = 1
UpdateLocals(locals())
# prints 20
print a
However, when I call UpdateLocals from inside a function, it doesn't do what I expect:
def TestUpdateLocals():
a = 1
UpdateLocals(locals())
print a
# prints 1
TestUpdateLocals()
How can I make the second case work like the first?
UPDATE:
Aswin's explanation makes sense and is very helpful to me. However I still want a mechanism to update the local variables. Before I figure out a less ugly approach, I'm going to do the following:
def LoadDictionary():
return {'a': 10, 'b': 20, 'c': 30}
def TestUpdateLocals():
a = 1
for name, value in LoadDictionary().iteritems():
exec('%s = value' % name)
Of course the construction of the string statements can be automated, and the details can be hidden from the user.

You have asked a very good question. In fact, the ability to update local variables is very important and crucial in saving and loading datasets for machine learning or in games. However, most developers of Python language have not come to a realization of its importance. They focus too much on conformity and optimization which is nevertheless important too.
Imagine you are developing a game or running a deep neural network (DNN), if all local variables are serializable, saving the entire game or DNN can be simply put into one line as print(locals()), and loading the entire game or DNN can be simply put into one line as locals().update(eval(sys.stdin.read())).
Currently, globals().update(...) takes immediate effect but locals().update(...) does not work because Python documentation says:
The default locals act as described for function locals() below:
modifications to the default locals dictionary should not be
attempted. Pass an explicit locals dictionary if you need to see
effects of the code on locals after function exec() returns.
Why they design Python in such way is because of optimization and conforming the exec statement into a function:
To modify the locals of a function on the fly is not possible without
several consequences: normally, function locals are not stored in a
dictionary, but an array, whose indices are determined at compile time
from the known locales. This collides at least with new locals added
by exec. The old exec statement circumvented this, because the
compiler knew that if an exec without globals/locals args occurred in
a function, that namespace would be "unoptimized", i.e. not using the
locals array. Since exec() is now a normal function, the compiler does
not know what "exec" may be bound to, and therefore can not treat is
specially.
Since global().update(...) works, the following piece of code will work in root namespace (i.e., outside any function) because locals() is the same as globals() in root namespace:
locals().update({'a':3, 'b':4})
print(a, b)
But this will not work inside a function.
However, as hacker-level Python programmers, we can use sys._getframe(1).f_locals instead of locals(). From what I have tested so far, on Python 3, the following piece of code always works:
def f1():
sys._getframe(1).f_locals.update({'a':3, 'b':4})
print(a, b)
f1()
However, sys._getframe(1).f_locals does not work in root namespace.

The locals are not updated here because, in the first case, the variable declared has a global scope. But when declared inside a function, the variable loses scope outside it.
Thus, the original value of the locals() is not changed in the UpdateLocals function.
PS: This might not be related to your question, but using camel case is not a good practice in Python. Try using the other method.
update_locals() instead of UpdateLocals()
Edit To answer the question in your comment:
There is something called a System Stack. The main job of this system stack during the execution of a code is to manage local variables, make sure the control returns to the correct statement after the completion of execution of the called function etc.,
So, everytime a function call is made, a new entry is created in that stack,
which contains the line number (or instruction number) to which the control has to return after the return statement, and a set of fresh local variables.
The local variables when the control is inside the function, will be taken from the stack entry. Thus, the set of locals in both the functions are not the same. The entry in the stack is popped when the control exits from the function. Thus, the changes you made inside the function are erased, unless and until those variables have a global scope.

does python 2.5 have an equivalent to Tcl's uplevel command?

Does python have an equivalent to Tcl's uplevel command? For those who don't know, the "uplevel" command lets you run code in the context of the caller. Here's how it might look in python:
def foo():
answer = 0
print "answer is", answer # should print 0
bar()
print "answer is", answer # should print 42
def bar():
uplevel("answer = 42")
It's more than just setting variables, however, so I'm not looking for a solution that merely alters a dictionary. I want to be able to execute any code.

In general, what you ask is not possible (with the results you no doubt expect). E.g., imagine the "any code" is x = 23. Will this add a new variable x to your caller's set of local variables, assuming you do find a black-magical way to execute this code "in the caller"? No it won't -- the crucial optimization performed by the Python compiler is to define once and for all, when def executes, the exact set of local variables (all the barenames that get assigned, or otherwise bound, in the function's body), and turn every access and setting to those barenames into very fast indexing into the stackframe. (You could systematically defeat that crucial optimization e.g. by having an exec '' at the start of every possible caller -- and see your system's performance crash through the floor in consequence).
Except for assigning to the caller's local barenames, exec thecode in thelocals, theglobals may do roughly what you want, and the inspect module lets you get the locals and globals of the caller in a semi-reasonable way (in as far as deep black magic -- which would make me go postal on any coworker suggesting it be perpetrated in production code -- can ever be honored with the undeserved praise of calling it "semi-reasonable", that is;-).
But you do specify "I want to be able to execute any code." and the only solution to that unambiguous specification (and thanks for being so precise, as it makes answering easier!) is: then, use a different programming language.

Is the third party library written in Python? If yes, you could rewrite and rebind the function "foo" at runtime with your own implementation. Like so:
import third_party
original_foo = third_party.foo
def my_foo(*args, **kwds):
# do your magic...
original_foo(*args, **kwds)
third_party.foo = my_foo
I guess monkey-patching is slighly better than rewriting frame locals. ;)

exec code in original local scope of frame

I've written a remote Python debugger and one of the features I need is to execute arbitrary code while stopped at a breakpoint. My debugger uses the following to execute code received from the remote debugger:
exec (compile(code, '<string>', 'single') , frame.f_globals, frame.f_locals)
This works fine for the most part, but I've noticed a couple issues.
Assignment statements aren't actually applied to the original locals dictionary. This is probably due to the fact that f_locals is supposed to be read-only.
If stopped within a class method, accessing protected attributes (names beginning with double underscore) does not work. I'm assuming this is due to the name mangling that Python performs on protected attributes.
So my question is, is there a way around these limitations? Can I trick Python into thinking that the code is being executed in the actual local scope of that frame?
I'm using CPython 2.7, and I'm willing to accept a solution/hack specific to this version.

Assignment statements aren't actually
applied to the original locals
dictionary. This is probably due to
the fact that f_locals is supposed to
be read-only.
Not exactly, but the bytecode for the function will not look at locals, using rather a simple but crucial optimization whereby local variables are in a simple array, avoiding runtime lookups. The only way to avoid this (and make the function much, much slower) is compiling different code, e.g. code starting with an exec '' to force the compiler to avoid the optimization (in Python 2; no way, in Python 3). If you need to work with existing bytecode, you're out of luck: there is no way to accomplish what you desire.
If stopped within a class method,
accessing protected attributes (names
beginning with double underscore) does
not work. I'm assuming this is due to
the name mangling that Python performs
on protected attributes.
Yep, so this issue does allow a workaround: prepend _Classname to the name to mimic what the compiler does. Note that double-underscore prefixes means private: protected would be a single underscore (and would give you no trouble). Private names are specifically meant to avoid accidental classes with names bound in subclasses (and work decently for that one purpose, though not perfectly, and not for anything else;-).

I'm not sure I've understood you correctly, but exec does populate the locals parameter with assignments inside the code:
>>> loc = {}
>>> exec(compile('a=3', '<string>', 'single'), {}, loc)
>>> loc
{'a': 3}
Perhaps f_locals doesn't allow writes.

to execute arbitrary code while stopped at a breakpoint ... Can I trick Python into thinking that the code is being executed in the actual local scope of that frame?
The Python debugger, pdb, allows this. For example, let's say you are debugging the file tests/scopeTest.py, and you have the following line in your program, where the variable hasn't been declared in the program itself :
print (NOT_DEFINED_IN_PROGRAM)
so that running the code python tests/scopeTest.py would result in :
NameError: name 'NOT_DEFINED_IN_PROGRAM' is not defined
Now you would like to define that variable when stopped at that line in the debugger, and have the program continue executing, using that variable as if it had been defined in the program all along. In other words, you would like to effect the change within that scope, so that you can continue execution with that change permanent. It is actually possible :
$ python -m pdb tests/scopeTest.py
> /home/user/tests/scopeTest.py(1)<module>()
-> print (NOT_DEFINED_IN_PROGRAM)
(Pdb) 'NOT_DEFINED_IN_PROGRAM' in locals()
False
(Pdb) NOT_DEFINED_IN_PROGRAM = 5
(Pdb) 'NOT_DEFINED_IN_PROGRAM' in locals()
True
(Pdb) step
5
Pdb does this through a compile and exec in its default function, which does the equivalent of :
code = compile(line + '\n', <stdin>, 'single')
exec(code, self.curframe.f_globals, self.curframe_locals)
where self.curframe is a specific frame. Now, self.curframe_locals is not self.curframe.f_locals, because, as the setup function says :
# The f_locals dictionary is updated from the actual frame
# locals whenever the .f_locals accessor is called, so we
# cache it here to ensure that modifications are not overwritten.
self.curframe_locals = self.curframe.f_locals
Hope that helps, and is what you meant!
Take note that, even then, should you want to, for example, replace a function in the context of the program being debugged with a monkey-patched version, such as:
newGlobals['abs'] = myCustomAbsFunction
exec(code, newGlobals, locals)
the scope of the myCustomAbsFunction is not going to be the user program, but is going to be the context of where that function was defined, which is the debugger! There is a way around that too, but as it wasn't specifically asked, it is left as an exercise for the reader, for now. ^__^

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Understanding how Python "Compiles" or "Interprets" Function Objects - python

Related

A variable is in the scope of only some of thees python nested function definitions but not all, why?

How does python implement lookups for mutually recursive function calls?

Update locals from inside a function

does python 2.5 have an equivalent to Tcl's uplevel command?

exec code in original local scope of frame

Categories

Resources