I am trying to profile a program using kcachegrind. My program uses netCDF4-python, which reads files using the syntax
data = var[:]
I am having trouble getting any information for the function call tree below this step. I've had a look round to see what could be causing this and because the syntax isn't an explicit function call I cannot see what functions this calls or the breakdown of how long the called functions take.
I've thought about using line_profiler in the netcdf4-python library but I think because of how it is installed I would not be able to add the #profile decorator into the library.
Is there a way to get kcachegrind to enter the non explicit function call? Or is there a better tool for this task?
Related
I'm not too well-versed in Python, although, I think this question is mostly language-agnostic, but wanted to see if there was a particular way to do it for Python.
I'm working with a couple of Pytorch Python scripts.
The function test_qtensor_cpu on line 147 in https://github.com/pytorch/pytorch/blob/fbcode/warm/test/quantization/core/test_quantized_tensor.py is being executed when I run the script in https://github.com/pytorch/pytorch/blob/fbcode/warm/test/test_quantization.py, but I cannot find the function that calls test_qtensor_cpu. I did a grep -ri "test_qtensor_cpu*" . in the root director of this repo and the only result was the definition of this function.
Is there a way for this function to be called without explicitly writing out the function's name?
Just add:
def my_func_i_cant_figure_out_whats_calling_it()
import traceback
traceback.print_stack()
That will show you the callstack at that point, even without a breakpoint.
Yes, its possible to call a function without explicitly writing it out.
(and figuring out how to use a debugger is super useful... future you will thank you if you figure it out sooner rather than later)
Line 29 of test_quantization.py imports TestQuantizedTensor (which includes the test_qtensor_cpu method).
The run_tests() (source here) at the bottom of the file will automatically run all test cases that have been imported (which includes TestQuantitizedTensor) via unittest (usually via unittest.main, though this can be changed by args passed to the test suite).
I would like to be able to inject a piece of code in the beginning of a given function.
Research mostly mentions using decorators, but for my specific use case I wouldn't want to wrap the modified function with an additional function call.
I am also unable to add a parameter to the function - I have an already compiled function at runtime and that's it.
The reason I wouldn't want to use a wrapper, is because I'd like to write a utility library which allows the programmer to "paste" a piece of code at the beginning of an already written function, without adding another level to the call stack at all. Mainly for performance reasons.
How can this be done? And would it work across Python versions without breaking?
Premature optimization is the root of all evil. You should not "simply assume" a wrapper function will have a major performance impact. There is no safe, simple, portable, way to do what you're asking. The most applicable solution is a custom metaclass as it allows you to control the creation of new objects based on it.
I have a question regarding python bindings.
I have a command-line which exposes some functionality and code is re-factored to provide the functionality through a shared library. I wanted to know what the real advantage that I get from "writing a python binding for the shared library" vs "calling the command line directly".
One obvious advantage I think will be performance, the shared library will link to the same process and the functionality can called within the same process. It will avoid spawning a new process through the command line.
Any other advantages I can get from writing a python binding for such a case ?
Thanks.
I can hardly imagine a case where one would prefer wrapping a library's command line interface over wrapping the library itself. (Unless there is a library that comes with a neat command line interface while being a total mess internally; but the OP indicates that the same functionality available via the command line is easily accessible in terms of library function calls).
The biggest advantage of writing a Python binding is a clearly defined data interface between the library and Python. Ideally, the library can operate directly on memory managed by Python, without any data copying involved.
To illustrate this, let's assume a library function does something more complicated than printing the current time, i.e., it obtains a significant amount of data as an input, performs some operation, and returns a significant amount of data as an output. If the input data is expected as an input file, Python would need to generate this file first. It must make sure that the OS has finished writing the file before calling the library via the command line (I have seen several C libraries where sleep(1) calls were used as a band-aid for this issue...). And Python must get the output back in some way.
If the command line interface does not rely on files but obtains all arguments on the command line and prints the output on stdout, Python probably needs to convert between binary data and string format, not always with the expected results. It also needs to pipe stdout back and parse it. Not a problem, but getting all this right is a lot of work.
What about error handling? Well, the command line interface will probably handle errors by printing error messages on stderr. So Python needs to capture, parse and process these as well. OTOH, the corresponding library function will almost certainly make a success flag accessible to the calling program. This is much more directly usable for Python.
All of this is obviously affecting performance, which you already mentioned.
As another point, if you are developing the library yourself, you will probably find after some time that the Python workflow has made the whole command line interface obsolete, so you can drop supporting it altogether and save yourself a lot of time.
So I think there is a clear case to be made for the Python bindings. To me, one of the biggest strengths of Python is the ease with which such wrappers can be created and maintained. Unfortunately, there are about 7 or 8 equally easy ways to do this. To get started, I recommend ctypes, since it does not require a compiler and will work with PyPy. For best performance use the native C-Python API, which I also found very easy to learn.
I am experimenting with writing more forgiving/ flexible functions and would like to know if it is possible to access the input arguments of a function as strings, before Python checks for syntax errors, NameErrors, etc, (for the purposes of doing my own input checking first)?
No. What you are looking for is sophisticated macro functionality. You can do this in Lisp, but Python (like most languages) does not support it.
If you want, you can preprocess a file and parse it using the ast module. But you would have to do this as a separate step, before you run your Python script.
I have used the Python's C-API to call some Python code in my c code and now I want to profile my python code for bottlenecks. I came across the PyEval_SetProfile API and am not sure how to use it. Do I need to write my own profiling function?
I will be very thankful if you can provide an example or point me to an example.
If you only need to know the amount of time spent in the Python code, and not (for example), where in the Python code the most time is spent, then the Python profiling tools are not what you want. I would write some simple C code that sampled the time before and after the Python interpreter invocation, and use that. Or, C-level profiling tools to measure the Python interpreter as a C function call.
If you need to profile within the Python code, I wouldn't recommend writing your own profile function. All it does is provide you with raw data, you'd still have to aggregate and analyze it. Instead, write a Python wrapper around your Python code that invokes the cProfile module to capture data that you can then examine.