Replacement for execfile in IPython3? - python

There are various ways of re-enabling an execfile-like behaviour for Python 3.x environments - in the documentation and here on stackoverflow, but I did not find an exact replacement for my use case.
I am using IPython, and in Python 2.7.x execfile used to run a script file just as if I typed the exact same lines straight into IPython. This includes:
useful exception-tracebacks are given
local variables of my environment are available in the scripted code
variables defined locally in the script are available in the environment (after the execfile call, of course)
import X as Y statements in the script also make Y available in the environment
the execfile call works in interactive modes and also directly in python scripts
execution of the entire script code is guaranteed for each call (except when hitting an exception)
execfile is readily available wherever Python is - no lengthy definition or import of obscure package
Common solutions that have not entirely worked so far:
from scriptfile import * does not satisfy #2 and #4. For function definitions, it also fails #6, as re-issuing the import does not update a function - this can be remedied with a reload(scriptfile) call.
The exec(scriptfilehandle.read()) construction satisfies #5-7. With some amendments also #2-4 can be dealt with - but this evolves to a lengthy definition, which I can't recall right now, and tracebacks are still a mess.
IPython's %run scriptfile is nice, but falls short at least on requirement #2, #4 and #5.
Copying the script code from file and using IPython's %paste leaves out on #5 and #7 - and is quite cumbersome to do for each call.
Do you have any solutions I have not yet heard of?
I am using IPython+execfile while playing around with data, producing (lots of) matplotlib figures, trying stuff,...
And if I like some lines I wrote, I put the code snippet in a script. Some examples of what I am doing:
writing a script that prepares the environment for a specific dataset: doing imports, loading some data, defining some useful functions to work on this dataset,...
semiautomatic plotting: elaborate script for beautifully plotting ten figures of data held in a local variable, then revising the plot-script, and re-executing it, then filtering the data, re-executing plot-script,...
writing a script that utilizes several of my smaller snippets, to be run overnight on a large dataset
apart from data exploration and plotting, sometimes I need to write small scripts on various systems: a RasPi, a router with OpenWRT, a machine without internet access, a Windows machine (without admin rights) - all of which may have their restrictions on what libraries are available
On the other hand I have to admit, that I'm not a professional programmer - my insight on Python's inner workings with local/global variables, and what really happens in an import statement, are very limited.
Any help - may it be a solution to my problems or a helpful explaination - would be greatly appreciated!

Related

What is the best or proper way to allow debugging of generated code?

For various reasons, in one project I generate executable code by means of generating AST from various source files the compiling that to bytecode (though the question could also work for cases where the bytecode is generated directly I guess).
From some experimentation, it looks like the debugger more or less just uses the lineno information embedded in the AST alongside the filename passed to compile in order to provide a representation for the debugger's purposes, however this assumes the code being executed comes from a single on-disk file.
That is not necessarily the case for my project, the executable code can be pieced together from multiple sources, and some or all of these sources may have been fetched over the network, or been retrieved from non-disk storage (e.g. database).
And so my Y questions, which may be the wrong ones (hence the background):
is it possible to provide a memory buffer of some sort, or is it necessary to generate a singular on-disk representation of the "virtual source"?
how well would the debugger deal with jumping around between the different bits and pieces if the virtual source can't or should not be linearised[0]
and just in case, is the assumption of Python only supporting a single contiguous source file correct or can it actually be fed multiple sources somehow?
[0] for instance a web-style literate program would be debugged in its original form, jumping between the code sections, not in the so-called "tangled" form
Some of this can be handled by the trepan3k debugger. For other things various hooks are in place.
First of all it can debug based on bytecode alone. But of course stepping instructions won't be possible if the line number table doesn't exist. And for that reason if for no other, I would add a "line number" for each logical stopping point, such as at the beginning of statements. The numbers don't have to be line numbers, they could just count from 1 or be indexes into some other table. This is more or less how go's Pos type position works.
The debugger will let you set a breakpoint on a function, but that function has to exist and when you start any python program most of the functions you define don't exist. So the typically way to do this is to modify the source to call the debugger at some point. In trepan3k the lingo for this is:
from trepan.api import debug; debug()
Do that in a place where the other functions you want to break on and that have been defined.
And the functions can be specified as methods on existing variables, e.g. self.my_function()
One of the advanced features of this debugger is that will decompile the bytecode to produce source code. There is a command called deparse which will show you the context around where you are currently stopped.
Deparsing bytecode though is a bit difficult so depending on which kind of bytecode you get the results may vary.
As for the virtual source problem, well that situation is somewhat tolerated in the debugger, since that kind of thing has to go on when there is no source. And to facilitate this and remote debugging (where the file locations locally and remotely can be different), we allow for filename remapping.
Another library pyficache is used to for this remapping; it has the ability I believe remap contiguous lines of one file into lines in another file. And I think you could use this over and over again. However so far there hasn't been need for this. And that code is pretty old. So someone would have to beef up trepan3k here.
Lastly, related to trepan3k is a trepan-xpy which is a CPython bytecode debugger which can step bytecode instructions even when the line number table is empty.

Workflow with multiple files using a REPL (Python)

My general workflow is using 2 screens; 1 for the script, 1 for the interactive buffer. I then evaluate parts of the script code in the interactive buffer. This is really nice when working with a small project (I just reevaluate the code I changed at that moment, everything else equal). It allows for (I'm convinced) the fastest iterations in writing a script.
However, I'm now working on a project where I try to be neat and organize my project with a single class per file (or close to it).
Now here lies the issue: While it is easy to evaluate parts of code, in Python it is difficult to import modules once they have already been imported.
Mind that I most of the time have useful objects in the interactive buffer / global scope (perhaps some objects took 10 minutes to be built). This means that I can't just close & reopen everything.
Are others struggling with this as well? How to conveniently work with multiple files and a Python REPL?
Using the brilliant elpy package allows this from the latest release (available on MELPA).
You can assign a dedicated REPL to each script as you like. It helps to rename the REPL sessions using M-x rename-buffer.
Here is a snippet from the relevant documentation:
M-x elpy-shell-toggle-dedicated-shell
By default, python buffers are all attached to a same python shell (that lies in the Python buffer), meaning that all buffers and code
fragments will be send to this shell.
elpy-shell-toggle-dedicated-shell attaches a dedicated python shell
(not shared with the other python buffers) to the current python
buffer. To make this the default behavior (like the deprecated option
elpy-dedicated-shells did), use the following snippet:
(add-hook 'elpy-mode-hook (lambda () (elpy-shell-toggle-dedicated-shell 1)))
M-x elpy-shell-set-local-shell
Attach the current python buffer to a specific python shell (whose name is asked with completion). You can use this function to have one
python shell per project, with:
(add-hook 'elpy-mode-hook (lambda () (elpy-shell-set-local-shell elpy-project-root)))
Here is the relevant GitHub issue, which was merged into the master branch on 16th Feb 2018.

Python workflow for testing modules without re-executing the entire codebase

I'm working on a python project that spans across multiple files and directories. Here's my workflow:
Run main python script
Main script calls some functions in other files
Functions in other files/directories execute
In the middle of execution, there is a bug in one of the functions, but I find the bug only after the main script finishes. Sometimes, there may not be a bug, but rather some parameter that needs tweaking.
I go back and fix the bug/make the necessary tweaks and re-run the main program and this time it executes fine.
Obviously, this workflow is terribly inefficient as considerable amount of code (that runs prior to the buggy function) gets re-executed. What would be ideal is to run the program in ipython and after discovering the issue and making the necessary changes, restart from the place where the buggy function executions starts and not from the beginning. I'm not sure how to achieve this and any help would be much appreciated.
I know how to rerun lines from ipython history (%rerun) and how to ensure autoreload of changed files in ipython, but in this case, I can't really type out the lines of code into ipython. Writing unit tests may not always be feasible, so I need an alternate solution. My use case is something similar to setting a "breakpoint" and then re-executing code past the breakpoint multiple times so as to avoid re-executing the code prior to the breakpoint more than once, while ensuring that all the necessary variables (until that stage) are correctly populated. One final condition is that I may not be able to use an IDE and vim is the only editor available across all the environments I work with.
You could start writing test cases for every function and try to debug each function separately instead of the whole program/script.
There is a python unittest-module: https://docs.python.org/3.4/library/unittest.html and a lot of tutorials like (just an example): http://docs.python-guide.org/en/latest/writing/tests/
It seems annoying writing tests but thinking about test cases gives deeper understanding of "How should the function behave if...".
You could use the code module to include "breakpoints" in your code
import code
# ... later in your program add
code.interact(local=locals()) # enter python interpreter at this point (ctrl+D to continue execution)

Debugging the python VM

Is there a debugger that can debug the Python virtual machine while it is running Python code, similar to the way that GDB works with C/C++? I have searched online and have come across pdb, but this steps through the code executed by the Python interpreter, not the Python interpreter as its running the program.
The reference implementation of Python, CPython, is written in C. You can use GDB to debug it as you would debug any other program written in C.
That said, Python does have a few little helpers for use in GDB buried under Misc/gdbinit. It's got comments to describe what each command does, but I'll repeat them here for convenience:
pyo: Dump a PyObject *.
pyg: Dump a PyGC_Head *.
pylocals: Print the local variables of the current Python stack frame.
lineno: Get the current Python line number.
pyframe: Print the source file name, line, and function.
pyframev: pyframe + pylocals
printframe: pyframe if within PyEval_EvalFrameEx; built-in frame otherwise
pystack: Print the Python stack trace.
pystackv: Print the Python stack trace with local variables.
pu: Print a Unicode string.
It looks like the Fedora project has also assembled their own collection of commands to assist with debugging which you may want to look at, too.
If you're looking to debug Python at the bytecode level, that's exactly what pdb does.
If you're looking to debug the CPython reference interpreter… as icktoofay's answer says, it's just a C program like any other, so you can debug it the same way as any other C program. (And you can get the source, compile it with extra debugging info, etc. if you want, too.)
You almost certainly want to look at EasierPythonDebugging, which shows how to set up a bunch of GDB helpers (which are Python scripts, of course) to make your life easier. Most importantly: The Python stack is tightly bound to the C stack, but it's a big mess to try to map things manually. With the right helpers, you can get stack traces, frame dumps, etc. in Python terms instead of or in parallel with the C terms with no effort. Another big benefit is the py-print command, which can look up a Python name (in nearly the same way a live interpreter would), call its __repr__, and print out the result (with proper error handling and everything so you don't end up crashing your gdb session trying to walk the PyObject* stuff manually).
If you're looking for some level in between… well, there is no level in between. (Conceptually, there are multiple layers to the interpreter, but it's all just C code, and it all looks alike to gdb.)
If you're looking to debug any Python interpreter, rather than specifically CPython, you might want to look at PyPy. It's written in a Python-like language called RPython, and there are various ways to use pdb to debug the (R)Python interpreter code, although it's not as easy as it could be (unless you use a flat-translated PyPy, which will probably run about 100x too slow to be tolerable). There are also GDB debug hooks and scripts for PyPy just like the ones for CPython, but they're not as complete.

"writing a python binding" vs "using command-line directly"

I have a question regarding python bindings.
I have a command-line which exposes some functionality and code is re-factored to provide the functionality through a shared library. I wanted to know what the real advantage that I get from "writing a python binding for the shared library" vs "calling the command line directly".
One obvious advantage I think will be performance, the shared library will link to the same process and the functionality can called within the same process. It will avoid spawning a new process through the command line.
Any other advantages I can get from writing a python binding for such a case ?
Thanks.
I can hardly imagine a case where one would prefer wrapping a library's command line interface over wrapping the library itself. (Unless there is a library that comes with a neat command line interface while being a total mess internally; but the OP indicates that the same functionality available via the command line is easily accessible in terms of library function calls).
The biggest advantage of writing a Python binding is a clearly defined data interface between the library and Python. Ideally, the library can operate directly on memory managed by Python, without any data copying involved.
To illustrate this, let's assume a library function does something more complicated than printing the current time, i.e., it obtains a significant amount of data as an input, performs some operation, and returns a significant amount of data as an output. If the input data is expected as an input file, Python would need to generate this file first. It must make sure that the OS has finished writing the file before calling the library via the command line (I have seen several C libraries where sleep(1) calls were used as a band-aid for this issue...). And Python must get the output back in some way.
If the command line interface does not rely on files but obtains all arguments on the command line and prints the output on stdout, Python probably needs to convert between binary data and string format, not always with the expected results. It also needs to pipe stdout back and parse it. Not a problem, but getting all this right is a lot of work.
What about error handling? Well, the command line interface will probably handle errors by printing error messages on stderr. So Python needs to capture, parse and process these as well. OTOH, the corresponding library function will almost certainly make a success flag accessible to the calling program. This is much more directly usable for Python.
All of this is obviously affecting performance, which you already mentioned.
As another point, if you are developing the library yourself, you will probably find after some time that the Python workflow has made the whole command line interface obsolete, so you can drop supporting it altogether and save yourself a lot of time.
So I think there is a clear case to be made for the Python bindings. To me, one of the biggest strengths of Python is the ease with which such wrappers can be created and maintained. Unfortunately, there are about 7 or 8 equally easy ways to do this. To get started, I recommend ctypes, since it does not require a compiler and will work with PyPy. For best performance use the native C-Python API, which I also found very easy to learn.

Categories