I am using pdb to debug a program. I successively hit 'c' to run through the code and at each step pdb shows me which line is executed.
Let's say we have this code:
def foo(bar):
print(bar)
foo('hey')
First, line 4 calls function foo. Then pdb shows me the line
def foo(bar)
is executed.
Why? Is not that line just a kind of label? What happens before "print(bar)" is executed? (that comes with another 's' hit)
EDIT: I experimented that something done is to actually check the definition. In fact, in the case foo were a generator (that cannot be called in such a way) python still gets there and then decides to treat it as a generator (or a function depending the case..).
def is not a declaration in Python, it's an executable statement. At runtime it retrieves the code object compiled for the function, wraps that in a dynamically created function object, and binds the result to the name following the def. For example, consider this useless code:
import dis
def f():
def g():
return 1
dis.dis(f)
Here's part of the output (Python 2.7.5 here):
0 LOAD_CONST 1 (<code object g at 02852338, file ...>)
3 MAKE_FUNCTION 0
6 STORE_FAST 0 (g)
All this is usually an invisible detail, but you can play some obscure tricks with it ;-) For example, think about what this code does:
fs = []
for i in range(3):
def f(arg=i**3):
return arg
fs.append(f)
print [f() for f in fs]
Here's the output:
[0, 1, 8]
That's because the executable def creates three distinct function objects, one for each time through the loop. Great fun :-)
What happens before "print(bar)" is executed?
This is just an educated guess: I suppose the current IP is pushed onto the stack and then the parameters. Then a new stack frame is created, the parameters are popped from stack and added as locals to the current scope. Something along this line.
Related
When inside tracing function, debugging a function call, is it possible to somehow retrieve the calling expression?
I can get calling line number from traceback object but if there are several function calls (possibly to the same function) on that line (eg. as subexpression in a bigger expression) then how could I learn where this call came from? I would be happy even with the offset from start of the source line.
traceback.tb_lasti seems to give more granual context (index of last bytecode tried) -- is it somehow possible to connect a bytecode to its exact source range?
EDIT: Just to clarify -- I need to extract specific (sub)expression (the callsite) from the calling source line.
Traceback frames have a line number too:
lineno = traceback.tb_lineno
You can also reach the code object, which will have a name, and a filename:
name = traceback.tb_frame.f_code.co_name
filename = traceback.tb_frame.f_code.co_filename
You can use the filename and line number, plus the frame globals and the linecache module to efficiently turn that into the correct source code line:
linecache.checkcache(filename)
line = linecache.getline(filename, lineno, traceback.tb_frame.f_globals)
This is what the traceback module uses to turn a traceback into a useful piece of information, in any case.
Since bytecode only has a line number associated with it, you cannot directly lead the bytecode back to the precise part of a source code line; you'd have to parse that line yourself to determine what bytecode each part would emit then match that with the bytecode of the code object.
You could do that with the ast module, but you can't do that on a line-by-line basis as you'd need scope context to generate the correct bytecodes for local versus cell versus global name look-ups, for example.
Unfortunately, compiled bytecode has lost its column offsets; the bytecode index to line number mapping is contained in the co_lnotab line number table. The dis module is a nice way of looking at the bytecode and interpreting co_lnotab:
>>> dis.dis(compile('a, b, c', '', 'eval'))
1 0 LOAD_NAME 0 (a)
3 LOAD_NAME 1 (b)
6 LOAD_NAME 2 (c)
9 BUILD_TUPLE 3
12 RETURN_VALUE
^-- line number
However, there's nothing stopping us from messing with the line number:
>>> a = ast.parse('a, b, c', mode='eval')
>>> for n in ast.walk(a):
... if hasattr(n, 'col_offset'):
... n.lineno = n.lineno * 1000 + n.col_offset
>>> dis.dis(compile(a, '', 'eval'))
1000 0 LOAD_NAME 0 (a)
1003 3 LOAD_NAME 1 (b)
1006 6 LOAD_NAME 2 (c)
9 BUILD_TUPLE 3
12 RETURN_VALUE
Since compiling code directly should be the same as compiling via ast.parse, and since messing with line numbers shouldn't affect the generated bytecode (other than the co_lnotab), you should be able to:
locate the source file
parse it with ast.parse
munge the line numbers in the ast to include the column offsets
compile the ast
use the tb_lasti to search the munged co_lnotab
convert the munged line number back to (line number, column offset)
I know it's necromancy but I posted a similar question yesterday without seeing this one first. So just in case someone is interested, I solved my problem in a different way than the accepted answer by using the inspect and ast modules in Python3. It's still for debugging and educational purpose but it does the trick.
The answer is rather long so here is the link
That's how I finally solved the problem: I instrumented each function call in the original program by wrapping it in a call to a helper function together with information about the source location of the original call. Actually I was interested in controlling the evaluation of each subexpression in the program, so I wrapped each subexpression.
More precisely: when I had an expression e in the original program, it became
_after(_before(location_info), e)
in the instrumented program. The helpers were defined like this:
def _before(location_info):
return location_info
def _after(location_info, value):
return value
When tracer reported the call to _before, I knew that it's about to evaluate the expression at location represented by location_info (tracing system gives me access to local variables/parameters, that's how I got to know the value of location_info). When tracer reported the call to _after, I knew that the expession indicated by location_info was just evaluated and the value is in value.
I could have written the execution "event handling" right into those helper functions and bypass the tracing system altogether, but I needed it for other reasons as well, so I used those helpers only for triggering a "call" event in tracing system.
The result can be seen here: http://thonny.org
Suppose I have two functions one after another in the same python file:
def A(n):
B(n-1)
# if I add A(1) here, it gives me an error
def B(n):
if n <= 0:
return
else:
A(n-1)
When the interpreter is reading A, B is not yet defined, however this code does not give me an error. This confuses me, because I thought that Python programs are interpreted line by line. How come the attempt to call B within A doesn't give an error immediately, before anything is called?
My understanding is that, when def is interpreted, Python adds an entry to some local name space locals() with {"function name": function address}, but as for the function body, it only does a syntax check:
def A():
# this will give an error as it isn't a valid expression
syntax error
def B():
# even though x is not defined, this does not give an error
print(x)
# same as above, NameError is only detected during runtime
A()
Do I have it right?
The line B(n-1) says "when this statement is executed, lookup some function B in the module scope, then call it with parameters n-1". Since the lookup happens when the function is executed, B can be defined later.
(Additionally, you can completely overwrite B with a different function, and A will call the new B afterwards. But that can lead to some confusing code.)
If you're worried about not catching calls to nonexistent functions, you can try using static analysis tools. Other than that, be sure you're testing your code.
A SyntaxError will be caught at compile time, but most other errors (NameError, ValueError, etc.) will be caught only at runtime, and then only if that function is called.
"if I have written a function, if its not called in my test.." - and that is why you should test everything.
Some IDEs will raise warnings in various situations, but the best option is still to conduct thorough testing yourself. This way, you can also check for errors that arise through factors like user input, which an IDE's automated checks won't cover.
When the interpreter is reading A, B is not yet defined, however this code does not give me an error
The reason why python interpreter doesn't give an error can be found from docs, which is called forward reference technically:
Name resolution of free variables occurs at runtime, not at compile time.
Considering the first example code specifically:
In Python, def A(): is an executable statement. It is evaluated by using the code inside the block to create a function object, and then assigning that object to the name A. This does more than just a "syntax check"; it actually compiles the code (in the reference implementation, this means it produces bytecode for the Python VM). That process involves determining that B i) is a name, and ii) will be looked up globally (because there is no assignment to B within A).
We can see the result using the dis module from the standard library:
>>> def A(n):
... B(n-1)
...
>>> import dis
>>> dis.dis(A)
2 0 LOAD_GLOBAL 0 (B)
2 LOAD_FAST 0 (n)
4 LOAD_CONST 1 (1)
6 BINARY_SUBTRACT
8 CALL_FUNCTION 1
10 POP_TOP
12 LOAD_CONST 0 (None)
14 RETURN_VALUE
this result is version-specific and implementation dependent; I have shown what I get using the reference implementation of Python 3.8. As you can infer, the LOAD_GLOBAL opcode represents the instruction to look for B in the global namespace (and failing that, in the special built-in names).
However, none of the code actually runs until the function is called. So it does not matter that B isn't defined in the global namespace yet; it will be by the time it's needed.
From comments:
but why python does not check for NameError at compile time?
Because there is nothing sensible to check the names against. Python is a dynamic language; the entire point is that you have the freedom to Do Whatever It Takes to ensure that B is defined before A uses it. Including, for example, extremely bad ideas like downloading another Python file from the Internet and dynamically executing it in the current global namespace.
I am debugging method f() that has no return in it.
class A(object):
def __init__(self):
self.X = []
def f(self):
for i in range(10):
self.X.append(i)
I need to see how this method modifies variable X right after it is called. To do that, I insert a return at the end of the method, and set the breakpoint there:
That way, as soon as the method reaches its return, I can see the value of my variable X.
This does the job, but I am pretty sure there is a better way. Editing a method or function every time I need to debug it seems silly.
Question:
Is there a different way (e.g. an option in the debugger) to set a breakpoint at the end of a method that does not have a return?
(Note that setting a breakpoint at the function call and using Step Over would not display X when mouseovering, since the function is called from a different module.)
You can add a conditional breakpoint on the last line and set the condition to be something that occurs only in the last iteration.
In this instance the condition is very easy since it's just i == 9, but it may be a lot more complex depending on your loop condition so sometimes adding a statement at the end will be the easier solution.
That screenshot is from IntelliJ IDEA and your screenshot looks like it's from the same IDE, so just right-click the breakpoint to show the dialog and enter your condition.
If you're using some other IDE I'm sure there is capability to make a breakpoint conditional.
Update:
There is no support for breaking at the end of a method in the Python debugger, only at the start of a method:
b(reak) [[filename:]lineno | function[, condition]]
With a lineno argument, set a break there in the current file. With a function argument, set a break at the first executable statement within that function. The line number may be prefixed with a filename and a colon, to specify a breakpoint in another file (probably one that hasn't been loaded yet). The file is searched on sys.path. Note that each breakpoint is assigned a number to which all the other breakpoint commands refer.
If a second argument is present, it is an expression which must evaluate to true before the breakpoint is honored.
Without argument, list all breaks, including for each breakpoint, the number of times that breakpoint has been hit, the current ignore count, and the associated condition if any.
Your IDE is hiding what's under the hood.
That is, something like
import pdb
is prepended to your script and
pdb.set_trace()
is inserted before the line onto which your placed your breakpoint.
From what you say I deduce that PyCharm does not like placing breakpoints on empty lines. However pdb.set_trace() can perfectly be placed at the end of a method.
So you could insert those yourself (or write a macro) and run python -m pdb to start debugging.
(Edit) example
import pdb
class A(object):
def __init__(self):
self.X = []
def f(self):
for i in range(10):
self.X.append(i)
pdb.set_trace()
if __name__ == '__main__':
a = A()
a.f()
Debug with
$ python -m pdb test.py
> /dev/test.py(1)<module>()
----> 1 import pdb
2
3 class A(object):
ipdb> cont
--Return--
> /dev/test.py(11)f()->None
-> pdb.set_trace()
(Pdb) self.X
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
(Pdb)
ipdb can be used instead of pdb.
With pdb you can use nice combination of break function and until lineno:
Without argument, continue execution until the line with a number
greater than the current one is reached.
With a line number, continue execution until a line with a number
greater or equal to that is reached. In both cases, also stop when the
current frame returns.
Changed in version 3.2: Allow giving an explicit line number.
You can achieve what you needed.
I modified your example a bit (so you would see that instruction gets executed although pdb reports it as "next instruction"):
01: class A(object):
02:
03: def __init__(self):
04: self.X = []
05:
06: def f(self):
07: print('pre exec')
08: for i in range(10):
09: self.X.append(i)
10: print('post exec')
11:
12: a = A()
13: a.f()
14: print('Game is over')
15:
And result from running with python -m pdb test.py goes like this:
Start debugging and run it just after class declaration (so you can add named breakpoint):
> d:\tmp\stack\test.py(1)<module>()
-> class A(object):
(Pdb) until 11
> d:\tmp\stack\test.py(12)<module>()
-> a = A()
Now, break at the beginning of function:
(Pdb) break A.f
Breakpoint 1 at d:\tmp\stack\test.py:6
Just continue with execution until it hits breakpoint:
(Pdb) continue
> d:\tmp\stack\test.py(7)f()
-> print('pre exec')
Take advantage of "also stop when the current frame returns":
(Pdb) until 14
pre exec
post exec
--Return--
As you can see, both pre exec and post exec were printed, but when executing where you are still in f():
(Pdb) w
c:\python32\lib\bdb.py(405)run()
-> exec(cmd, globals, locals)
<string>(1)<module>()
d:\tmp\stack\test.py(13)<module>()
-> a.f()
> d:\tmp\stack\test.py(10)f()->None
-> print('post exec')
> d:\tmp\stack\test.py(10)f()->None
-> print('post exec')
And all context variables are intact:
(Pdb) p self.X
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Now with your real life example:
01: class A(object):
02: def __init__(self):
03: self.X = []
04:
05: def f(self):
06: for i in range(10):
07: self.X.append(i)
08:
09: a = A()
10: a.f()
11: print('Game is over')
Start the similar fashion as before:
> d:\tmp\stack\test.py(1)<module>()
-> class A(object):
(Pdb) until 8
> d:\tmp\stack\test.py(9)<module>()
-> a = A()
(Pdb) break A.f
Breakpoint 1 at d:\tmp\stack\test.py:5
(Pdb) cont
> d:\tmp\stack\test.py(6)f()
-> for i in range(10):
Now... Breakpoint in f.A actually means breakpoint at first statement of f.A which is unfortunately for i in... so it would break on it every time.
If you don't actually start your real code with loop, you can skip this part.
(Pdb) disable 1
Disabled breakpoint 1 at d:\tmp\stack\test.py:5
Again, use the until <end of file>:
(Pdb) until 10
--Return--
> d:\tmp\stack\test.py(6)f()->None
-> for i in range(10):
And again, all frame variables are available:
(Pdb) p i
9
(Pdb) w
c:\python32\lib\bdb.py(405)run()
-> exec(cmd, globals, locals)
<string>(1)<module>()
d:\tmp\stack\test.py(10)<module>()
-> a.f()
> d:\tmp\stack\test.py(6)f()->None
-> for i in range(10):
(Pdb)
The sad thing here is, that I wanted to try this piece of automation:
(Pdb) break A.f
Breakpoint 1 at d:\tmp\stack\test.py:5
(Pdb) commands 1
(com) disable 1
(com) until 11
(com) end
Which would do everything you need automatically (again, disable 1 not needed when you have at least one pre-loop statement), but according to documentation on commands:
Specifying any command resuming execution (currently continue, step, next, return, jump, quit and their abbreviations) terminates the command list (as if that command was immediately followed by end). This is because any time you resume execution (even with a simple next or step), you may encounter another breakpoint–which could have its own command list, leading to ambiguities about which list to execute.
So until just doesn't seem to work (at least for Python 3.2.5 under windows) and you have to do this by hand.
There is a quick&dirty solution that works on any language that supports monkeypatching (Python, Ruby, ObjC, etc.). I honestly can't remember ever needing it in Python, but I did it quite a bit in both SmallTalk and ObjC, so maybe it'll be useful for you.
Just dynamically wrap A.f in a function, like this:
real_A_f = A.f
def wrap_A_f(self, *args, **kwargs):
result = real_A_f(self, *args, **kwargs)
return result
A.f = wrap_A_f
In most scriptable debuggers, you should be able to write a script that does this automatically for a method by name. In pdb, which lets you execute normal Python code right in the debugger, it's especially simple.
Now you can put a breakpoint on that return result, and it's guaranteed to hit immediately after the real A.f returns (even if it returns in the middle or falls off the end without a return statement).
A few things you may want to add:
If you also want to catch A.f raising, put a try: and except: raise around the code, and add a breakpoint on the raise.
For Python 2.x, you may want to wrap that up with types.MethodType to make a real unbound method.
If you only want a breakpoint on a specific A instance, you can either use a conditional breakpoint that checks self is a, or use types.MethodType to create a bound instance and store that as a.f.
You may want to use functools.wraps if you want to hide the wrapper from the rest of the code (and from your debugging, except in the cases where you really want to see it).
Since pdb lets you execute dynamic code right in the live namespace, you can put a wrap_method function somewhere in your project that does this, and then, at the prompt, write p utils.wrap_method(A, 'f'). But if you wrap multiple methods this way, they're going to share the same breakpoints (inside the wrapper function defined inside wrap_method). Here I think a conditional breakpoint is the only reasonable option.
If you want access to the real A.f's locals from the wrapper's breakpoint, that's a lot harder. I can think of some very hacky options (e.g., exec(real_A_f.__code__, real_A_f.globals()), but nothing I'd be happy with.
You have a few options here.
Add a break point to the last line in the function.
In this case, the last line is within a loop, so you would have to iterate over each item in the loop.
Add a break point where the function is being called.
This will stop the debugger prior to the function being called, but you can "Step Over" the function to see the value of A.x after A.f() is called.
Add a temporary statement the end of the function to break at
This trick would work if your function ends in a loop and there are multiple places the function is called or you don't want to track down the function call.
You can add a simple statement to the end of the function for debugging purposes and add a break point there.
def f(self):
for i in range(10):
self.X.append(i)
debug_break = 1
Why not just leave the return in there? Or a return None. It's implicit anyway, the interpreter/compiler will do the same thing regardless:
In fact, even functions without a return statement do return a value, albeit a rather boring one. This value is called None (it’s a built-in name).
[source: Python Tutorial 4.6].
I have read the following posts but I am still unsure of something.
Python Compilation/Interpretation Process
Why python compile the source to bytecode before interpreting?
If I have a single Python file myfunctions.py containing the following code.
x = 3
def f():
print x
x = 2
Then, saying $ python myfunctions.py runs perfectly fine.
But now make one small change to the above file. The new file looks as shown below.
x = 3
def f():
print x
x = 2
f() # there is a function call now
This time, the code gives out an error. Now, I am trying to understand this behavior. And so far, these are my conclusions.
Python creates bytecode for x=3
It creates a function object f, quickly scans and has bytecode which talks about the local variables within f's scope but note that the bytecode for all statements in Python are unlikely to have been constructed.
Now, Python encounters a function call, it knows this function call is legitimate because the bare minimum bytecode talking about the function object f and its local variables is present.
Now the interpreter takes the charge of executing the bytecode but from the initial footprint it knows x is a local variable here and says - "Why are you printing before you assign?"
Can someone please comment on this? Thanks in advance. And sorry if this has been addressed before.
When the interpreter reads a function, for each "name" (variable) it encounters, the interpreter decides if that name is local or non-local. The criteria that is uses is pretty simple ... Is there an assignment statement anywhere in the body to that name (barring global statements)? e.g.:
def foo():
x = 3 # interpreter will tag `x` as a local variable since we assign to it here.
If there is an assignment statement to that name, then the name is tagged as "local", otherwise, it gets tagged as non-local.
Now, in your case, you try to print a variable which was tagged as local, but you do so before you've actually reached the critical assignment statement. Python looks for a local name, but doesn't find it so it raises the UnboundLocalError.
Python is very dynamic and allows you to do lots of crazy things which is part of what makes it so powerful. The downside of this is that it becomes very difficult to check for these errors unless you actually run the function -- In fact, python has made the decision to not check anything other than syntax until the function is run. This explains why you never see the exception until you actually call your function.
If you want python to tag the variable as global, you can do so with an explicit global1 statement:
x = 3
def foo():
global x
print x
x = 2
foo() # prints 3
print x # prints 2
1python3.x takes this concept even further an introduces the nonlocal keyword
mgilson got half of the answer.
The other half is that Python doesn't go looking for errors beyond syntax errors in functions (or function objects) it is not about to execute. So in the first case, since f() doesn't get called, the order-of-operations error isn't checked for.
In this respect, it is not like C and C++, which require everything to be fully declared up front. It's kind of like C++ templates, where errors in template code might not be found until the code is actually instantiated.
x = 4
def test():
print(x)
x = 2
test()
This gives an error because when you go to print(x), it sees that you have x declared in the scope of the function test, and it tells you you're trying to reference it without having declared it.
I know that if I do global x it's no problem, or if I move the print statement...I know.
But I don't understand how the interpreter knows that I have x redeclared after the print statement if it goes through the code one line at a time. How can it know what's coming?
There's clearly more to this than I'm aware of.
Who told you Python is executed one line at a time? Python is executed one bytecode at a time. And that bytecode comes from the compiler, which operates one statement at a time. Statements can be multiple lines. And a function definition is a statement.
So, one of the first steps in compiling a function definition is to gather up all of the variables assigned within that function's body. Any variable that's assigned, but doesn't have a global or nonlocal declaration, is a local.
(As a side note, that function body isn't actually compiled into a function, it's compiled into a code object, which gets stashed somewhere, and only run when you call the function, and into some bytecode that builds a function object out of that code object, which gets run where your function definition occurs in the normal order.)
You can, in fact, see what the compiler made of your function by looking at its members:
>>> def foo():
... global y
... x=1
... y=1
>>> foo.__code__.co_varnames
('x',)
Then, when it's creating the bytecode for your function body, all variables in co_varnames are compiled into local lookups, while the rest are compiled into global lookups:
>>> dis.dis(foo)
3 0 LOAD_CONST 1 (1)
3 STORE_FAST 0 (x)
4 6 LOAD_CONST 1 (1)
9 STORE_GLOBAL 0 (y)
12 LOAD_CONST 0 (None)
15 RETURN_VALUE
Python does execute instructions one at a time; it's just that a function definition is a single instruction. When the Python interpreter encounters a function definition, it compiles the function: first transforming it into an abstract syntax tree (AST), then into bytecode. It's during this process that Python "looks ahead" and sees which variables should be considered local (by scanning the AST and seeing what names are assigned to but not declared global).
When the function is called it's executed an instruction at a time, but the compilation process is free to consider the entire function because it's considered a single operation. This is useful for various optimizations as well.
If you look at this program as three discrete operations:
x = 4 # variable assignment
def test(): foo # function definition
test() # function call
it makes a bit more sense. The interpreter processes the function definition - and that entails figuring the scope of variables, etc, hence your error.