Suppose I have two functions one after another in the same python file:
def A(n):
B(n-1)
# if I add A(1) here, it gives me an error
def B(n):
if n <= 0:
return
else:
A(n-1)
When the interpreter is reading A, B is not yet defined, however this code does not give me an error. This confuses me, because I thought that Python programs are interpreted line by line. How come the attempt to call B within A doesn't give an error immediately, before anything is called?
My understanding is that, when def is interpreted, Python adds an entry to some local name space locals() with {"function name": function address}, but as for the function body, it only does a syntax check:
def A():
# this will give an error as it isn't a valid expression
syntax error
def B():
# even though x is not defined, this does not give an error
print(x)
# same as above, NameError is only detected during runtime
A()
Do I have it right?
The line B(n-1) says "when this statement is executed, lookup some function B in the module scope, then call it with parameters n-1". Since the lookup happens when the function is executed, B can be defined later.
(Additionally, you can completely overwrite B with a different function, and A will call the new B afterwards. But that can lead to some confusing code.)
If you're worried about not catching calls to nonexistent functions, you can try using static analysis tools. Other than that, be sure you're testing your code.
A SyntaxError will be caught at compile time, but most other errors (NameError, ValueError, etc.) will be caught only at runtime, and then only if that function is called.
"if I have written a function, if its not called in my test.." - and that is why you should test everything.
Some IDEs will raise warnings in various situations, but the best option is still to conduct thorough testing yourself. This way, you can also check for errors that arise through factors like user input, which an IDE's automated checks won't cover.
When the interpreter is reading A, B is not yet defined, however this code does not give me an error
The reason why python interpreter doesn't give an error can be found from docs, which is called forward reference technically:
Name resolution of free variables occurs at runtime, not at compile time.
Considering the first example code specifically:
In Python, def A(): is an executable statement. It is evaluated by using the code inside the block to create a function object, and then assigning that object to the name A. This does more than just a "syntax check"; it actually compiles the code (in the reference implementation, this means it produces bytecode for the Python VM). That process involves determining that B i) is a name, and ii) will be looked up globally (because there is no assignment to B within A).
We can see the result using the dis module from the standard library:
>>> def A(n):
... B(n-1)
...
>>> import dis
>>> dis.dis(A)
2 0 LOAD_GLOBAL 0 (B)
2 LOAD_FAST 0 (n)
4 LOAD_CONST 1 (1)
6 BINARY_SUBTRACT
8 CALL_FUNCTION 1
10 POP_TOP
12 LOAD_CONST 0 (None)
14 RETURN_VALUE
this result is version-specific and implementation dependent; I have shown what I get using the reference implementation of Python 3.8. As you can infer, the LOAD_GLOBAL opcode represents the instruction to look for B in the global namespace (and failing that, in the special built-in names).
However, none of the code actually runs until the function is called. So it does not matter that B isn't defined in the global namespace yet; it will be by the time it's needed.
From comments:
but why python does not check for NameError at compile time?
Because there is nothing sensible to check the names against. Python is a dynamic language; the entire point is that you have the freedom to Do Whatever It Takes to ensure that B is defined before A uses it. Including, for example, extremely bad ideas like downloading another Python file from the Internet and dynamically executing it in the current global namespace.
Related
When inside tracing function, debugging a function call, is it possible to somehow retrieve the calling expression?
I can get calling line number from traceback object but if there are several function calls (possibly to the same function) on that line (eg. as subexpression in a bigger expression) then how could I learn where this call came from? I would be happy even with the offset from start of the source line.
traceback.tb_lasti seems to give more granual context (index of last bytecode tried) -- is it somehow possible to connect a bytecode to its exact source range?
EDIT: Just to clarify -- I need to extract specific (sub)expression (the callsite) from the calling source line.
Traceback frames have a line number too:
lineno = traceback.tb_lineno
You can also reach the code object, which will have a name, and a filename:
name = traceback.tb_frame.f_code.co_name
filename = traceback.tb_frame.f_code.co_filename
You can use the filename and line number, plus the frame globals and the linecache module to efficiently turn that into the correct source code line:
linecache.checkcache(filename)
line = linecache.getline(filename, lineno, traceback.tb_frame.f_globals)
This is what the traceback module uses to turn a traceback into a useful piece of information, in any case.
Since bytecode only has a line number associated with it, you cannot directly lead the bytecode back to the precise part of a source code line; you'd have to parse that line yourself to determine what bytecode each part would emit then match that with the bytecode of the code object.
You could do that with the ast module, but you can't do that on a line-by-line basis as you'd need scope context to generate the correct bytecodes for local versus cell versus global name look-ups, for example.
Unfortunately, compiled bytecode has lost its column offsets; the bytecode index to line number mapping is contained in the co_lnotab line number table. The dis module is a nice way of looking at the bytecode and interpreting co_lnotab:
>>> dis.dis(compile('a, b, c', '', 'eval'))
1 0 LOAD_NAME 0 (a)
3 LOAD_NAME 1 (b)
6 LOAD_NAME 2 (c)
9 BUILD_TUPLE 3
12 RETURN_VALUE
^-- line number
However, there's nothing stopping us from messing with the line number:
>>> a = ast.parse('a, b, c', mode='eval')
>>> for n in ast.walk(a):
... if hasattr(n, 'col_offset'):
... n.lineno = n.lineno * 1000 + n.col_offset
>>> dis.dis(compile(a, '', 'eval'))
1000 0 LOAD_NAME 0 (a)
1003 3 LOAD_NAME 1 (b)
1006 6 LOAD_NAME 2 (c)
9 BUILD_TUPLE 3
12 RETURN_VALUE
Since compiling code directly should be the same as compiling via ast.parse, and since messing with line numbers shouldn't affect the generated bytecode (other than the co_lnotab), you should be able to:
locate the source file
parse it with ast.parse
munge the line numbers in the ast to include the column offsets
compile the ast
use the tb_lasti to search the munged co_lnotab
convert the munged line number back to (line number, column offset)
I know it's necromancy but I posted a similar question yesterday without seeing this one first. So just in case someone is interested, I solved my problem in a different way than the accepted answer by using the inspect and ast modules in Python3. It's still for debugging and educational purpose but it does the trick.
The answer is rather long so here is the link
That's how I finally solved the problem: I instrumented each function call in the original program by wrapping it in a call to a helper function together with information about the source location of the original call. Actually I was interested in controlling the evaluation of each subexpression in the program, so I wrapped each subexpression.
More precisely: when I had an expression e in the original program, it became
_after(_before(location_info), e)
in the instrumented program. The helpers were defined like this:
def _before(location_info):
return location_info
def _after(location_info, value):
return value
When tracer reported the call to _before, I knew that it's about to evaluate the expression at location represented by location_info (tracing system gives me access to local variables/parameters, that's how I got to know the value of location_info). When tracer reported the call to _after, I knew that the expession indicated by location_info was just evaluated and the value is in value.
I could have written the execution "event handling" right into those helper functions and bypass the tracing system altogether, but I needed it for other reasons as well, so I used those helpers only for triggering a "call" event in tracing system.
The result can be seen here: http://thonny.org
I was testing some code to answer Possible to turn an input string into a callable function object in Python? and the question got closed as duplicate but I could not find the answer to that question in the source answers.
There are many answers on SO similar to this but none address the issue how to return a function from string. exec() doesn't return a function. It just executes arbitrary code.
my question is, is this right approach to convert string to function and return as a function object?
my_string = "def add5(x):return x + 5"
def string_to_func(my_string):
exec(my_string)
for var_name, var in locals().items():
if callable(var):
return var
print("no callable in string")
#fallback code.
add5 = string_to_func(my_string)
print(add5(3))
P.S.: I will delete this question if the original question gets reopened.
Roughly speaking, if you assume that inputs are fully trusted, and where the provided string is of valid Python syntax that will produce a single callable function, and if exec cannot be used, something like this can be provided
import ast
import types
def string_to_function(source):
tree = ast.parse(source)
if len(tree.body) != 1 or not isinstance(tree.body[0], ast.FunctionDef):
raise ValueError('provided code fragment is not a single function')
co = compile(tree, 'custom.py', 'exec')
# first constant should be the code object for the function
return types.FunctionType(co.co_consts[0], {})
Example:
f = string_to_function("""
def my_function(x):
return x + 5
"""
)
print('f = %d' % f(5))
Output
f = 10
The code ensures that the provided source is of a single function definition, and makes assumption of the organisation of the generated bytecode (i.e. only works for the compiler built into the current versions of Python, where the generated bytecode places the code object for the single function that was defined in the source in the 0th element in co_consts). The previous version to this answer made use of exec (which questioner "is not a big fan of exec anyway"), done in a way that binds the results into a specific target and this method should be more reliable as it does not touch this lower level structure, though the sanity checks using ast module used here could be included instead with that original version.
Also note that this answer is only applicable for Python 3+.
Further Edit:
I am actually still a little miffed by the remark on the usage of exec being assumed to execute arbitrary code (on fixed inputs) simply because what it actually does is often very misunderstood. In this particular case, if it is verified that only specific source is accepted, it doesn't necessarily mean every statement is executed immediately. This is especially true for this case (which isn't properly guaranteed in my original lazy answer, but this is where actual understanding of what the framework is actually doing is important for doing anything that involves dynamic compilation of code within the language framework, and given that more level of "safety" is desired (executing function immediately after the fact negates it hence I didn't implemented originally) using ast is done in the edit and it is now objectively better).
So what exactly does calling exec(co) do like essentially what the original example did when given the source input of an single function definition? It can be determined by looking at the bytecode like so:
>>> dis.dis(co)
2 0 LOAD_CONST 0 (<code object my_function at 0x7fdbec44c420, file "custom.py", line 2>)
2 LOAD_CONST 1 ('my_function')
4 MAKE_FUNCTION 0
6 STORE_NAME 0 (my_function)
8 LOAD_CONST 2 (None)
10 RETURN_VALUE
All it does is to load the code object, make it into a proper function and assign the result to my_function (on the currently relevant scope), and essentially that's it. So yes, the correct way is to verify that the source is definitely a function definition like so here (as verification through checking the AST is more safe than a more naive verification that only one statement is present), then running exec on a specific dict and extract the assignment from there. Using exec this way is not inherently less (or more) safe in this instance, given that any function that was provided would be executed immediately anyway.
I have read the following posts but I am still unsure of something.
Python Compilation/Interpretation Process
Why python compile the source to bytecode before interpreting?
If I have a single Python file myfunctions.py containing the following code.
x = 3
def f():
print x
x = 2
Then, saying $ python myfunctions.py runs perfectly fine.
But now make one small change to the above file. The new file looks as shown below.
x = 3
def f():
print x
x = 2
f() # there is a function call now
This time, the code gives out an error. Now, I am trying to understand this behavior. And so far, these are my conclusions.
Python creates bytecode for x=3
It creates a function object f, quickly scans and has bytecode which talks about the local variables within f's scope but note that the bytecode for all statements in Python are unlikely to have been constructed.
Now, Python encounters a function call, it knows this function call is legitimate because the bare minimum bytecode talking about the function object f and its local variables is present.
Now the interpreter takes the charge of executing the bytecode but from the initial footprint it knows x is a local variable here and says - "Why are you printing before you assign?"
Can someone please comment on this? Thanks in advance. And sorry if this has been addressed before.
When the interpreter reads a function, for each "name" (variable) it encounters, the interpreter decides if that name is local or non-local. The criteria that is uses is pretty simple ... Is there an assignment statement anywhere in the body to that name (barring global statements)? e.g.:
def foo():
x = 3 # interpreter will tag `x` as a local variable since we assign to it here.
If there is an assignment statement to that name, then the name is tagged as "local", otherwise, it gets tagged as non-local.
Now, in your case, you try to print a variable which was tagged as local, but you do so before you've actually reached the critical assignment statement. Python looks for a local name, but doesn't find it so it raises the UnboundLocalError.
Python is very dynamic and allows you to do lots of crazy things which is part of what makes it so powerful. The downside of this is that it becomes very difficult to check for these errors unless you actually run the function -- In fact, python has made the decision to not check anything other than syntax until the function is run. This explains why you never see the exception until you actually call your function.
If you want python to tag the variable as global, you can do so with an explicit global1 statement:
x = 3
def foo():
global x
print x
x = 2
foo() # prints 3
print x # prints 2
1python3.x takes this concept even further an introduces the nonlocal keyword
mgilson got half of the answer.
The other half is that Python doesn't go looking for errors beyond syntax errors in functions (or function objects) it is not about to execute. So in the first case, since f() doesn't get called, the order-of-operations error isn't checked for.
In this respect, it is not like C and C++, which require everything to be fully declared up front. It's kind of like C++ templates, where errors in template code might not be found until the code is actually instantiated.
x = 4
def test():
print(x)
x = 2
test()
This gives an error because when you go to print(x), it sees that you have x declared in the scope of the function test, and it tells you you're trying to reference it without having declared it.
I know that if I do global x it's no problem, or if I move the print statement...I know.
But I don't understand how the interpreter knows that I have x redeclared after the print statement if it goes through the code one line at a time. How can it know what's coming?
There's clearly more to this than I'm aware of.
Who told you Python is executed one line at a time? Python is executed one bytecode at a time. And that bytecode comes from the compiler, which operates one statement at a time. Statements can be multiple lines. And a function definition is a statement.
So, one of the first steps in compiling a function definition is to gather up all of the variables assigned within that function's body. Any variable that's assigned, but doesn't have a global or nonlocal declaration, is a local.
(As a side note, that function body isn't actually compiled into a function, it's compiled into a code object, which gets stashed somewhere, and only run when you call the function, and into some bytecode that builds a function object out of that code object, which gets run where your function definition occurs in the normal order.)
You can, in fact, see what the compiler made of your function by looking at its members:
>>> def foo():
... global y
... x=1
... y=1
>>> foo.__code__.co_varnames
('x',)
Then, when it's creating the bytecode for your function body, all variables in co_varnames are compiled into local lookups, while the rest are compiled into global lookups:
>>> dis.dis(foo)
3 0 LOAD_CONST 1 (1)
3 STORE_FAST 0 (x)
4 6 LOAD_CONST 1 (1)
9 STORE_GLOBAL 0 (y)
12 LOAD_CONST 0 (None)
15 RETURN_VALUE
Python does execute instructions one at a time; it's just that a function definition is a single instruction. When the Python interpreter encounters a function definition, it compiles the function: first transforming it into an abstract syntax tree (AST), then into bytecode. It's during this process that Python "looks ahead" and sees which variables should be considered local (by scanning the AST and seeing what names are assigned to but not declared global).
When the function is called it's executed an instruction at a time, but the compilation process is free to consider the entire function because it's considered a single operation. This is useful for various optimizations as well.
If you look at this program as three discrete operations:
x = 4 # variable assignment
def test(): foo # function definition
test() # function call
it makes a bit more sense. The interpreter processes the function definition - and that entails figuring the scope of variables, etc, hence your error.
I am using pdb to debug a program. I successively hit 'c' to run through the code and at each step pdb shows me which line is executed.
Let's say we have this code:
def foo(bar):
print(bar)
foo('hey')
First, line 4 calls function foo. Then pdb shows me the line
def foo(bar)
is executed.
Why? Is not that line just a kind of label? What happens before "print(bar)" is executed? (that comes with another 's' hit)
EDIT: I experimented that something done is to actually check the definition. In fact, in the case foo were a generator (that cannot be called in such a way) python still gets there and then decides to treat it as a generator (or a function depending the case..).
def is not a declaration in Python, it's an executable statement. At runtime it retrieves the code object compiled for the function, wraps that in a dynamically created function object, and binds the result to the name following the def. For example, consider this useless code:
import dis
def f():
def g():
return 1
dis.dis(f)
Here's part of the output (Python 2.7.5 here):
0 LOAD_CONST 1 (<code object g at 02852338, file ...>)
3 MAKE_FUNCTION 0
6 STORE_FAST 0 (g)
All this is usually an invisible detail, but you can play some obscure tricks with it ;-) For example, think about what this code does:
fs = []
for i in range(3):
def f(arg=i**3):
return arg
fs.append(f)
print [f() for f in fs]
Here's the output:
[0, 1, 8]
That's because the executable def creates three distinct function objects, one for each time through the loop. Great fun :-)
What happens before "print(bar)" is executed?
This is just an educated guess: I suppose the current IP is pushed onto the stack and then the parameters. Then a new stack frame is created, the parameters are popped from stack and added as locals to the current scope. Something along this line.