I am trying to debug a Python built-in class. My debugging has brought me into the realm of magic methods (aka dunder methods).
I am trying to figure out which dunder methods are called, if any. Normally I would do something like this:
import sys
import traceback
# This would be located where the I'm currently debugging
traceback.print_stack(file=sys.stdout)
However, traceback.print_stack does not give me the level of detail of printing what dunder methods area used in its vicinity.
Is there some way I can print out, in a very verbose manner, what is actually happening inside a block of code?
Sample Code
#!/usr/bin/env python3.6
import sys
import traceback
from enum import Enum
class TestEnum(Enum):
"""Test enum."""
A = "A"
def main():
for enum_member in TestEnum:
traceback.print_stack(file=sys.stdout)
print(f"enum member = {enum_member}.")
if __name__ == "__main__":
main()
I would like the above sample code to print out any dunder methods used (ex: __iter__).
Currently it prints out the path to the call to traceback.print_stack:
/path/to/venv/bin/python /path/to/file.py
File "/path/to/file.py", line 56, in <module>
main()
File "/path/to/file.py", line 51, in main
traceback.print_stack(file=sys.stdout)
enum member = TestEnum.A.
P.S. I'm not interested in going to the byte code level given by dis.dis.
I think, with the stacktrace you are looking at the wrong place. When you call print_stack from a place, that is executed only when coming from a dunder method, this method is very well included in the output.
I tried this code to verify:
import sys
import traceback
from enum import Enum
class TestEnum(Enum):
"""Test enum."""
A = "A"
class MyIter:
def __init__(self):
self.i = 0
def __next__(self):
self.i += 1
if self.i <= 1:
traceback.print_stack(file=sys.stdout)
return TestEnum.A
raise StopIteration
def __iter__(self):
return self
def main():
for enum_member in MyIter():
print(f"enum member = {enum_member}.")
if __name__ == "__main__":
main()
The last line of the stack trace is printed as
File "/home/lydia/playground/demo.py", line 21, in __next__
traceback.print_stack(file=sys.stdout)
In your original code, you are getting the stack trace at a time when all dunder methods have already returned. Thus they have been removed from the stack.
So I think, you want to have a look at a call graph instead. I know that IntelliJ / PyCharm can do this nicely at least in the paid editions.
There are other tools that you may want to try. How does pycallgraph look to you?
Update:
Python makes it actually pretty easy to dump a plain list of all the function calls.
Basically all you need to do is
import sys
sys.setprofile(tracefunc)
Write the tracefunc depending on your needs. Find a working example at this SO question: How do I print functions as they are called
Warning: I needed to start the script from an external shell. Starting it by using the play button in my IDE meant that the script would never terminate but write more and more lines. I assume it collides with the internal profiling done by my IDE.
The official documentation of sys.setprofile: https://docs.python.org/3/library/sys.html#sys.setprofile
And a random tutorial about tracing in Python: https://pymotw.com/2/sys/tracing.html
Note however, that by my experience you can get the best insights into the questions "who is calling whom?" or "where does this value even come from?" by using a plain-old debugger.
I also did some research on the subject matter, as information in #LydiaVanDyke's answer fueled better searches.
Printing Entire Call Stack
As #LydiaVanDyke points out, an IDE debugger is a really great way. I use PyCharm, and found that was my favorite solution, because one can:
Follow function calls + exact line numbers in the code
Read code around the calls, better understanding typing
Skip over calls one doesn't care to investigate
Another way is Python's standard library's trace. It offers both command line and embeddable methods for printing the entire call stack.
And yet another one is Python's built-in debugger module, pdb. This (invoked via pdb.set_trace()) really changed the game for me.
Visualization of Profiler Output
gprof2dot is another useful profiler visualization tool.
source code
useful tutorial
Finding Source Code
One of my other problems was not actually seeing the real source code, due to my IDE's stub files (PyCharm).
How to retrieve source code of Python functions details two methods of actually printing source code
With all this tooling, one feels quite empowered!
Related
When working with Python's asyncio package, I've noticed that I can't step into any code of its tasks.Task class. For example, when the calling code invokes the class's constructor, my next 'step into' get's me into a get_debug() function outside the class. After that, I return to the calling code with an initialised Task object. I've observed similar behaviour with Task.__next_step(): I'll just step into code that gets called by this method.
All Python versions (3.9, 3.10), IDEs (PyCharm, Visual Studio Code) and OSs (macOS, Windows) that I tested showed the same issue.
Does anyone know the reason for the debugger’s strange behaviour and, possibly, how to overcome it?
The call_soon() in the last line of the screenshot is issued from within Task.__init__. However, as you can see, the debugger never stepped into the initializer.
Update: Surprisingly, with Python 3.6 (Pythonista on iPadOS) I can step into Task.__init__ from base_events.BaseEventLoop.create_task().
The implementation of Task is in C (where available). The debugger cannot step into C code.
You can see this in the asyncio.tasks module:
class Task(futures._PyFuture):
...
_PyTask = Task
try:
import _asyncio
except ImportError:
pass
else:
# _CTask is needed for tests.
Task = _CTask = _asyncio.Task
That last line shows the Python implementation being overridden by the C implementation.
You can verify which implementation you have by inspecting the __module__ attribute of Task. e.g.
import asyncio
print(asyncio.Task.__module__)
A pure Python implementation will print asyncio.tasks. The C implementation will print _asyncio.
So on windows the signal and the thread approahc in general are bad ideas / don't work for timeout of functions.
I've made the following timeout code which throws a timeout exception from multiprocessing when the code took to long. This is exactly what I want.
def timeout(timeout, func, *arg):
with Pool(processes=1) as pool:
result = pool.apply_async(func, (*arg,))
return result.get(timeout=timeout)
I'm now trying to get this into a decorator style so that I can add it to a wide range of functions, especially those where external services are called and I have no control over the code or duration. My current attempt is below:
class TimeWrapper(object):
def __init__(self, timeout=10):
"""Timing decorator"""
self.timeout = timeout
def __call__(self, f):
def wrapped_f(*args):
with Pool(processes=1) as pool:
result = pool.apply_async(f, (*args,))
return result.get(timeout=self.timeout)
return wrapped_f
It gives a pickling error:
#TimeWrapper(7)
def func2(x, y):
time.sleep(5)
return x*y
File "C:\Users\rmenk\AppData\Local\Continuum\anaconda3\lib\multiprocessing\reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function func2 at 0x000000770C8E4730>: it's not the same object as __main__.func2
I'm suspecting this is due to the multiprocessing and the decorator not playing nice but I don't actually know how to make them play nice. Any ideas on how to fix this?
PS: I've done some extensive research on this site and other places but haven't found any answers that work, be it with pebble, threading, as a function decorator or otherwise. If you have a solution that you know works on windows and python 3.5 I'd be very happy to just use that.
What you are trying to achieve is particularly cumbersome in Windows. The core issue is that when you decorate a function, you shadow it. This happens to work just fine in UNIX due to the fact it uses the fork strategy to create a new process.
In Windows though, the new process will be a blank one where a brand new Python interpreter is started and loads your module. When the module gets loaded, the decorator hides the real function making it hard to find for the pickle protocol.
The only way to get it right is to rely on a trampoline function to be set during the decoration. You can take a look on how is done on pebble but, as long as you're not doing it for an exercise, I'd recommend to use pebble directly as it already offers what you are looking for.
from pebble import concurrent
#concurrent.process(timeout=60)
def my_function(var, keyvar=0):
return var + keyvar
future = my_function(1, keyvar=2)
future.result()
The only problem You have here is that You tested the decorated function in the main context. Move it out to a different module and it will probably work.
I wrote the wrapt_timeout_decorator what uses wrapt & dill & multiprocess & pipes versus pickle & multiprocessing & queue, because it can serialize more datatypes.
It might look simple at first, but under windows a reliable timeout decorator is quite tricky - You might use mine, its quite mature and tested :
https://github.com/bitranox/wrapt_timeout_decorator
On Windows the main module is imported again (but with a name != 'main') because Python is trying to simulate a forking-like behavior on a system that doesn't support forking. multiprocessing tries to create an environment similar to Your main process by importing the main module again with a different name. Thats why You need to shield the entry point of Your program with the famous " if name == 'main': "
import lib_foo
def some_module():
lib_foo.function_foo()
def main():
some_module()
# here the subprocess stops loading, because __name__ is NOT '__main__'
if __name__ = '__main__':
main()
This is a problem of Windows OS, because the Windows Operating System does not support "fork"
You can find more information on that here:
Workaround for using __name__=='__main__' in Python multiprocessing
https://docs.python.org/2/library/multiprocessing.html#windows
Since main.py is loaded again with a different name but "main", the decorated function now points to objects that do not exist anymore, therefore You need to put the decorated Classes and functions into another module. In general (especially on windows) , the main() program should not have anything but the main function, the real thing should happen in the modules. I am also used to put all settings or configurations in a different file - so all processes or threads can access them (and also to keep them in one place together, not to forget typing hints and name completion in Your favorite editor)
The "dill" serializer is able to serialize also the main context, that means the objects in our example are pickled to "main.lib_foo", "main.some_module","main.main" etc. We would not have this limitation when using "pickle" with the downside that "pickle" can not serialize following types:
functions with yields, nested functions, lambdas, cell, method, unboundmethod, module, code, methodwrapper, dictproxy, methoddescriptor, getsetdescriptor, memberdescriptor, wrapperdescriptor, xrange, slice, notimplemented, ellipsis, quit
additional dill supports:
save and load python interpreter sessions, save and extract the source code from functions and classes, interactively diagnose pickling errors
To support more types with the decorator, we selected dill as serializer, with the small downside that methods and classes can not be decorated in the main context, but need to reside in a module.
You can find more information on that here: Serializing an object in __main__ with pickle or dill
I have one script that calls another - let's call them master.py and fetch.py. I suppose the second script could be integrated into the first, but it does have distinct functionality - so keeping them separate seems like a good way to force myself to learn how to call outside scripts.
Here's the basic structure of fetch.py:
<import block>
infiles = <paths>
arcpy.env.workspace = os.path.dirname(infile)
ws = arcpy.env.workspace
newfile_list = []
def main():
name = <name>
if not arcpy.Exists(name + ".gdb"):
global ws
new_gdb = DM.CreateFileGDB(ws, name + ".gdb")
newfile_list.append(new_gdb)
other_func1()
other_func2()
print "\nNew files from fetch.py:"
for i in newfile_list:
print " " + i
def other_func1():
stuff
def other_func2():
stuff
if __name__ == '__main__':
main()
And master.py:
<import block>
infiles = <paths>
def f1():
stuff
def f2():
stuff
import fetch
fetch.main()
f1()
f2()
The problems, concerning placement of the import block & file definitions of fetch.py:
When I put them inside main() and run it as a standalone script, my arcpy functions don't work because my various imports haven't run yet. Putting them ahead of main() and making them global, solves this problem.
But when I put them outside of main() as you see here, I get an error saying the local variable ws is referenced before assignment. I think this may have to do with my calling fetch.main(), where the initial lines don't get read. (I was able to make it work by declaring global ws, but don't know if this is advisable.)
How do I structure fetch.py so that the import statements and file definitions get read, both when run as a standalone script and when called?
Instead of rewriting your code for you, I will try to point you to some ideas from the developer philosophy toolkit to get you on the way how to refine your approach.
My feeling is that you should contemplate YAGNI and KISS when thinking about your code.
In your comment you write:
I'm keeping the 'fetch' script separate because finding feature orientation seems like a useful stand-alone tool down the road.
Like I say: YAGNI and KISS: If you need it as a standalone tool in the future then split it out in the future. For now keep it simple and put the code that belongs together in one module and make it callable exactly the way you need it. When you work with it and understand the problem better and see what else is needed, you can then add other ways of calling your script. There is no reason why you shouldn't keep all the code in one module while it is still manageable and just add different ways to call it.
If the time comes that you want to organize your code into several modules you can refactor it in a way that you pull out the things you do on the module level at the moment (either into functions or maybe even classes). Then you can control exactly what should happen when and you can control the order of things better.
As further reading in regards to structuring a Python project I would recommend the pypa sample project and Starting A Python Project The Right Way. The problems you have with imports will likely cease to exist if you structure your code and your project in a sensible, pythonic way.
I'm trying to test a Python script (2.7) where I work with the standar input (readed with raw_input() and writed with a simple print) but I don't find how do this and I'm sure that this issue is very simple.
This is a very very very resume code of my script:
def example():
number = raw_input()
print number
if __name__ == '__main__':
example()
I want to write a unittest test to check this, but I don't find how. I've trying with StringIO and other things but I don't find the solution to do this really simple.
Somebody have a idea?
PD: Of course in the real script I use data blocks with several lines and other kind of data.
Thank you so much.
EDIT:
Thank you so much for the first really specific answer, it works perfectly, only I've had a little problem importing StringIO, because I was doing import StringIO and I needed to import like from StringIO import StringIO (I don't understand really why), but be that as It may, it works.
But I I've found another problem using this way, in my project I need test a scripts with this way (that work perfectly thanks to your support) but I want do this:
I have a file with a lot of test to pass over a script, so I open the file and read blocks of info with their result blocks and I would like to do that the code will be able to process a block checking their result and do the same with other and another...
Something like this:
class Test(unittest.TestCase):
...
#open file and process saving data like datablocks and results
...
allTest = True
for test in tests:
stub_stdin(self, test.dataBlock)
stub_stdouts(self)
runScrip()
if sys.stdout.getvalue() != test.expectResult:
allTest = False
self.assertEqual(allTest, True)
I know that maybe unittest doesn't has sense now, but you can do a idea about I want. So, this way fails and I don't know why.
Typical techniques involve mocking the standard sys.stdin and sys.stdout with your desired items. If you do not care for Python 3 compatibility you can just use the StringIO module, however if you want forward thinking and is willing to restrict to Python 2.7 and 3.3+, supporting for this both Python 2 and 3 in this way becomes possible without too much work through the io module (but requires a bit of modification, but put this thought on hold for now).
Assuming you already have a unittest.TestCase going, you can create a utility function (or method in the same class) that will replace sys.stdin/sys.stdout as outlined. First the imports:
import sys
import io
import unittest
In one of my recent projects I've done this for stdin, where it take a str for the inputs that the user (or another program through pipes) will enter into yours as stdin:
def stub_stdin(testcase_inst, inputs):
stdin = sys.stdin
def cleanup():
sys.stdin = stdin
testcase_inst.addCleanup(cleanup)
sys.stdin = StringIO(inputs)
As for stdout and stderr:
def stub_stdouts(testcase_inst):
stderr = sys.stderr
stdout = sys.stdout
def cleanup():
sys.stderr = stderr
sys.stdout = stdout
testcase_inst.addCleanup(cleanup)
sys.stderr = StringIO()
sys.stdout = StringIO()
Note that in both cases, it accepts a testcase instance, and calls its addCleanup method that adds the cleanup function call that will reset them back to where they were when the duration of a test method is concluded. The effect is that for the duration from when this was invoked in the test case until the end, sys.stdout and friends will be replaced with the io.StringIO version, meaning you can check its value easily, and don't have to worry about leaving a mess behind.
Better to show this as an example. To use this, you can simply create a test case like so:
class ExampleTestCase(unittest.TestCase):
def test_example(self):
stub_stdin(self, '42')
stub_stdouts(self)
example()
self.assertEqual(sys.stdout.getvalue(), '42\n')
Now, in Python 2, this test will only pass if the StringIO class is from the StringIO module, and in Python 3 no such module exists. What you can do is use the version from the io module with a modification that makes it slightly more lenient in terms of what input it accepts, so that the unicode encoding/decoding will be done automatically rather than triggering an exception (such as print statements in Python 2 will not work nicely without the following). I typically do this for cross compatibility between Python 2 and 3:
class StringIO(io.StringIO):
"""
A "safely" wrapped version
"""
def __init__(self, value=''):
value = value.encode('utf8', 'backslashreplace').decode('utf8')
io.StringIO.__init__(self, value)
def write(self, msg):
io.StringIO.write(self, msg.encode(
'utf8', 'backslashreplace').decode('utf8'))
Now plug your example function plus every code fragment in this answer into one file, you will get your self contained unittest that works in both Python 2 and 3 (although you need to call print as a function in Python 3) for doing testing against stdio.
One more note: you can always put the stub_ function calls in the setUp method of the TestCase if every single test method requires that.
Of course, if you want to use various mocks related libraries out there to stub out stdin/stdout, you are free to do so, but this way relies on no external dependencies if this is your goal.
For your second issue, test cases have to be written in a certain way, where they must be encapsulated within a method and not at the class level, your original example will fail. However you might want to do something like this:
class Test(unittest.TestCase):
def helper(self, data, answer, runner):
stub_stdin(self, data)
stub_stdouts(self)
runner()
self.assertEqual(sys.stdout.getvalue(), answer)
self.doCleanups() # optional, see comments below
def test_various_inputs(self):
data_and_answers = [
('hello', 'HELLOhello'),
('goodbye', 'GOODBYEgoodbye'),
]
runScript = upperlower # the function I want to test
for data, answer in data_and_answers:
self.helper(data, answer, runScript)
The reason why you might want to call doCleanups is to prevent the cleanup stack from getting as deep as all the data_and_answers pairs are there, but that will pop everything off the cleanup stack so if you had any other things that need to be cleaned up at the end this might end up being problematic - you are free to leave that there as all of the stdio related objects will be restored at the end in the same order, so the real one will always be there. Now the function I wanted to test:
def upperlower():
raw = raw_input()
print (raw.upper() + raw),
So yes, a bit of explanation for what I did might help: remember within a TestCase class, the framework relies strictly on the instance's assertEqual and friends for it to function. So to ensure testing being done at the right level you really want to call those asserts all the time so that helpful error messages will be shown at the moment the error occurred with the inputs/answers that didn't quite show up right, rather than until the very end like what you did with the for loop (that will tell you something was wrong, but not exactly where out of the hundreds and now you are mad). Also the helper method - you can call it anything you want, as long as it doesn't start with test because then the framework will try to run it as one and it will fail terribly. So just follow this convention and you can basically have templates within your test case to run your test with - you can then use it in a loop with a bunch of inputs/outputs like what I did.
As for your other question:
only I've had a little problem importing StringIO, because I was doing import StringIO and I needed to import like from StringIO import StringIO (I don't understand really why), but be that as It may, it works.
Well, if you look at my original code I did show you how did import io and then overrode the io.StringIO class by defining class StringIO(io.StringIO). However it works for you because you are doing this strictly from Python 2, whereas I do try to target my answers to Python 3 whenever possible given that Python 2 will (probably definitely this time) not be supported in less than 5 years. Think of the future users that might be reading this post who had similar problem as you. Anyway, yes, the original from StringIO import StringIO works, as that's the StringIO class from the StringIO module. Though from cStringIO import StringIO should work as that imports the C version of the StringIO module. It works because they all offer close enough interfaces, and so they will basically work as intended (until of course you try to run this under Python 3).
Again, putting all this together along with my code should result in a self-contained working test script. Do remember to look at documentation and follow the form of the code, and not invent your own syntax and hoping things to work (and as for exactly why your code didn't work, because the "test" code was defined at where the class was being constructed, so all of that was executed while Python was importing your module, and since none of the things that are needed for the test to run are even available (namely the class itself doesn't even exist yet), the whole thing just dies in fits of twitching agony). Asking questions here help too, even though the issue you face is something really common, not having a quick and simple name to search for your exact problem does make it difficult to figure out where you went wrong, I supposed? :) Anyway good luck, and good on you for taking the effort to test your code.
There are other methods, but given that the other questions/answers I looked at here at SO doesn't seem to help, I hope this one this. Other ones for reference:
How to supply stdin, files and environment variable inputs to Python unit tests?
python mocking raw input in unittests
Naturally, it bares repeating that all of this can be done using unittest.mock available in Python 3.3+ or the original/rolling backport version on pypi, but given that those libraries hides some of the intricacies on what actually happens, they may end up hiding some of the details on what actually happens (or need to happen) or how the redirection actually happens. If you want, you can read up on unittest.mock.patch and go down slightly to the StringIO patching sys.stdout section.
I love being able to modify the arguments the get sent to a function, using settrace, like :
import sys
def trace_func(frame,event,arg):
value = frame.f_locals["a"]
if value % 2 == 0:
value += 1
frame.f_locals["a"] = value
def f(a):
print a
if __name__ == "__main__":
sys.settrace(trace_func)
for i in range(0,5):
f(i)
And this will print:
1
1
3
3
5
What other cool stuff can you do using settrace?
I would strongly recommend against abusing settrace. I'm assuming you understand this stuff, but others coming along later may not. There are a few reasons:
Settrace is a very blunt tool. The OP's example is a simple one, but there's practically no way to extend it for use in a real system.
It's mysterious. Anyone coming to look at your code would be completely stumped why it was doing what it was doing.
It's slow. Invoking a Python function for every line of Python executed is going to slow down your program by many multiples.
It's usually unnecessary. The original example here could have been accomplished in a few other ways (modify the function, wrap the function in a decorator, call it via another function, etc), any of which would have been better than settrace.
It's hard to get right. In the original example, if you had not called f directly, but instead called g which called f, your trace function wouldn't have done its job, because you returned None from the trace function, so it's only invoked once and then forgotten.
It will keep other tools from working. This program will not be debuggable (because debuggers use settrace), it will not be traceable, it will not be possible to measure its code coverage, etc. Part of this is due to lack of foresight on the part of the Python implementors: they gave us settrace but no gettrace, so it's difficult to have two trace functions that work together.
Trace functions make for cool hacks. It's fun to be able to abuse it, but please don't use it for real stuff. If I sound hectoring, I apologize, but this has been done in real code, and it's a pain. For example, DecoratorTools uses a trace function to perform the magic feat of making this syntax work in Python 2.3:
# Method decorator example
from peak.util.decorators import decorate
class Demo1(object):
decorate(classmethod) # equivalent to #classmethod
def example(cls):
print "hello from", cls
A neat hack, but unfortunately, it meant that any code that used DecoratorTools wouldn't work with coverage.py (or debuggers, I guess). Not a good tradeoff if you ask me. I changed coverage.py to provide a mode that lets it work with DecoratorTools, but I wish I hadn't had to.
Even code in the standard library sometimes gets this stuff wrong. Pyexpat decided to be different than every other extension module, and invoke the trace function as if it were Python code. Too bad they did a bad job of it.
</rant>
I made a module called pycallgraph which generates call graphs using sys.settrace().
Of course, code coverage is accomplished with the trace function. One cool thing we haven't had before is branch coverage measurement, and that's coming along nicely, about to be released in an alpha version of coverage.py.
So for example, consider this function:
def foo(x):
if x:
y = 10
return y
if you test it with this call:
assert foo(1) == 10
then statement coverage will tell you that all the lines of the function were executed. But of course, there's a simple problem in that function: calling it with 0 raises a UnboundLocalError.
Branch measurement would tell you that there's a branch in the code that isn't fully exercised, because only one leg of the branch is ever taken.
For example, get the memory consumption of Python code line-by-line: http://pypi.python.org/pypi/memory_profiler
One latest project that uses settrace heavily is PySnooper
It helps new programmers to trace/log/monitor their program output. Cheers!
I don't have an exhaustively comprehensive answer but one thing I did with it, with the help of another user on SO, was create a program that generates the trace tables of other Python programs.
The python debugger Pdb uses sys.settrace to analyse lines to debug.
Here's an c optimization/extension for pdb that also uses sys.settrace
https://bitbucket.org/jagguli/cpdb