I have a script named "poc.py" that takes one positional command line argument "inputfile.txt". The poc.py script uses argparse to handle the positional command line argument and then pass the args dict to main(). Once in main(), I read the input file, do some processing, create a pandas DataFrame, and finally plot the data. I am having difficulty manipulating my DataFrame and controlling the exact formatting of the resulting plot, so I'd like to try ipython to explore this interactively and see if I can get a better grasp on the "pythonic" ways to handle pandas/matplotlib.
So I tried experimenting with ipython and running a script, but I can't get ipython to keep my script's namespace.
I have tried this:
$ ipython --pylab -i poc.py inputfile.txt
Which runs my script just fine, and displays the plots (even without the blocking plt.show() call), but when the script is finished, the ipython who and whos commands say Interactive namespace is empty. Likewise if I first enter the ipython shell and then do:
In [2]: run poc.py inputfile.txt
when the script is done (again, plenty of output, plots show up) I get the same result: an empty interactive namespace.
What am I missing in terms of understanding how to run an external and use ipython to interactively explore the data/objects in my script?
Here is a barebones example of how my script (poc.py) is setup:
import numpy as np
import matplotlib as plt
import pandas as pd
# etc ...more libraries and custom functions here...
def main(args):
data = np.genfromtxt(args.inputfile)
# (omitted)...more data processing / manipulation...
pdata = pd.DataFrame(data)
# (omitted)...more data processing / manipulation...
plt.plot(pdata)
# (omitted)...some formatting of the matplotlib/axes/figure objects
plt.show()
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='''some program...''')
parser.add_argument('inputfile', help='path to input file')
args = parser.parse_args()
main(args)
Answering myself here. I am aware that this could be a comment, but I decided the description warrants a standalone answer.
Basically after spending some time to understand the issues everyone mentioned about my variables going out of scope, and different ways to handle that, I found another solution that worked for me. I ended up using the embed() function from IPython. For the debugging process, I would add
...
from IPython import embed
embed()
...
at the point in the script where I wanted to stop and look around. Once dropped into the IPython shell, I could investigate variable dimensions and experiment with manipulating things. When I found the combo I wanted, I would copy the commands, drop out of the interactive interpreter and modify the script. The reason this worked for me is it did not involve modifying the structure of the program simply to get debugging information.
The problem here is that data and pdata aren't in your script's namespace; they're in the local namespace of a function that's already run and completed.
If you want to be able to inspect them after the fact, you'll need to store them somewhere. For example:
# ...
plt.show()
return data, pdata, plt
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='''some program...''')
parser.add_argument('inputfile', help='path to input file')
args = parser.parse_args()
data, pdata, plt = main(args)
(Alternatively, you could just make all of main's variables global, but this way seems cleaner.)
Now, your script's namespace includes variables named data, pdata, and plt that have the values you want. Plus, you can call main again and pass it a different file and get back the values from that file.
Related
I'm working on cloning a Virtual Machine (VM) in vCenter environment using this code. It takes command line arguments for name of the VM, template, datastore, etc. (e.g. $ clone_vm.py -s <host_name> -p < password > -nossl ....)
I have another Python file where I've been able to list the Datastore volumes in descending order of free_storage. I have stored the datastore with maximum available storage in a variable ds_max. (Let's call this ds_info.py)
I would like to use ds_max variable from ds_info.py as a command line argument for datastore command line argument in clone_vm.py.
I tried importing the os module in ds_info.py and running os.system(python clone_vm.py ....arguments...) but it did not take the ds_max variable as an argument.
I'm new to coding and am not confident to change the clone_vm.py to take in the Datastore with maximum free storage.
Thank you for taking the time to read through this.
I suspect there is something wrong in your os.system call, but you don't provide it, so I can't check.
Generally it is a good idea to use the current paradigm, and the received wisdom (TM) is that we use subprocess. See the docs, but the basic pattern is:
from subprocess import run
cmd = ["mycmd", "--arg1", "--arg2", "val_for_arg2"]
run(cmd)
Since this is just a list, you can easily drop arguments into it:
var = "hello"
cmd = ["echo", var]
run(cmd)
However, if your other command is in fact a python script it is more normal to refactor your script so that the main functionality is wrapped in a function, called main by convention:
# script 2
...
def main(arg1, arg2, arg3):
do_the_work
if __name__ == "__main__":
args = get_sys_args() # dummy fn
main(*args)
Then you can simply import script2 from script1 and run the code directly:
# script 1
from script2 import main
args = get_args() # dummy fn
main(*args)
This is 'better' as it doesn't involve spawning a whole new python process just to run python code, and it generally results in neater code. But nothing stops you calling a python script the same way you'd call anything else.
I'm using several Jupyter Notebooks to split the tasks between different modules. In my main notebook I call another module %run another_module.ipynb which loads all my data. However, it also plots and prints everything I have in another_module.ipynb.
I want to keep the plots in another_module.ipynb to help me visualise the data but I don't want to reprint everything when calling run another_module.ipynb. Is there an option to prevent priniting this?
Thanks
You could:
Override the print function and make it a no-op:
_print_function = print # create a backup in case you need it later
globals()["print"] = lambda *args, **kwargs: None
Run the file with the -i flag. Without -i, the file is run in a new namespace, so your modifications to the global variables are lost; with -i, the file is run in the current namespace.
%run -i another_module.ipynb
If you're using other methods to print logs (e.g., sys.stdout.write(), logging), it would be harder to create mocks for them. In that case, I would suggest redirecting the stdout or stderr pipe to /dev/null:
import os
import sys
sys.stdout = fopen(os.devnull, "w")
%run -i another_module.ipynb
Both methods are considered hacks and should only be used when you know the consequences. The better thing to do here is to change your code in the notebook, either to add a --verbose flag to control logging, or use some logging library (e.g., logging) that supports turning off logging entirely.
import sys
import os
log = os.system('cat /var/log/demesg')
This code prints the file by running the shell script cat /var/log/dmesg. However, it is not copied to the log. I want to use this data somewhere else or just print the data like print log.
How can I implement this?
Simply read from the file yourself:
with open('/var/log/dmesg') as logf:
log = logf.read()
print(log)
As an option, take a look at IPython. Interactive Python brings a lot of ease of use tools to the table.
ipy$ log = !dmesg
ipy$ type(log)
<3> IPython.utils.text.SList
ipy$ len(log)
<4> 314
calls the system, and captures stdout to the variable as a String List.
Made for collaborative science, handy for general purpose Python coding too. The web based collaborative Notebook (with interactive graphing, akin to Sage notebooks) is a sweet bonus feature as well, along with the ubiquitous support for parallel computing.
http://ipython.org
To read input from a child process you can either use fork(), pipe() and exec() from the os module; or use the subprocess module
I'm writing a little python IDE, and I want to add simple debugging. I don't need all the features of winpdb.
How do I launch a python program (by file name) with a breakpoint set at a line number so that it runs until that line number and halts?
Note that I don't want to do this from the command-line, and I don't want to edit the source (by inserting set_trace, for example). And I don't want it to stop at the first line so I have to run the debugger from there. I've tried all the obvious ways with pdb and bdb, but I must be missing something.
Pretty much the only viable way to do it (as far as I know) is to run Python as a subprocess from within your IDE. This avoids "pollution" from the current Python interpreter, which makes it fairly likely that the program will run in the same way as if you had started it independently. (If you have issues with this, check the subprocess environment.) In this manner, you can run a script in "debug mode" using
p = subprocess.Popen(args=[sys.executable, '-m', 'pdb', 'scriptname.py', 'arg1'],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
This will start up Python at the debugger prompt. You'll need to run some debugger commands to set breakpoints, which you can do like so:
o,e = p.communicate('break scriptname.py:lineno')
If this works, o should be the normal output of the Python interpreter after it sets a breakpoint, and e should be empty. I'd suggest you play around with this and add some checks in your code to ensure whether the breakpoints were properly set.
After that, you can start the program running with
p.communicate('continue')
At this point you'd probably want to hook the input, output, and error streams up to the console that you're embedding in your IDE. You would probably need to do this with an event loop, roughly like so:
while p.returncode is None:
o,e = p.communicate(console.read())
console.write(o)
console.write(e)
You should consider that snippet to be effectively pseudocode, since depending on how exactly your console works, it'll probably take some tinkering to get it right.
If this seems excessively messy, you can probably simplify the process a bit using the features of Python's pdb and bdb modules (I'm guessing "Python debugger" and basic debugger" respectively). The best reference on how to do this is the source code of the pdb module itself. Basically, the way the responsibilities of the modules are split is that bdb handles "under the hood" debugger functionality, like setting breakpoints, or stopping and restarting execution; pdb is a wrapper around this that handles user interaction, i.e. reading commands and displaying output.
For your IDE-integrated debugger, it would make sense to adjust the behavior of the pdb module in two ways that I can think of:
have it automatically set breakpoints during initialization, without you having to explicity send the textual commands to do so
make it take input from and send output to your IDE's console
Just these two changes should be easy to implement by subclassing pdb.Pdb. You can create a subclass whose initializer takes a list of breakpoints as an additional argument:
class MyPDB(pdb.Pdb):
def __init__(self, breakpoints, completekey='tab',
stdin=None, stdout=None, skip=None):
pdb.Pdb.__init__(self, completekey, stdin, stdout, skip)
self._breakpoints = breakpoints
The logical place to actually set up the breakpoints is just after the debugger reads its .pdbrc file, which occurs in the pdb.Pdb.setup method. To perform the actual setup, use the set_break method inherited from bdb.Bdb:
def setInitialBreakpoints(self):
_breakpoints = self._breakpoints
self._breakpoints = None # to avoid setting breaks twice
for bp in _breakpoints:
self.set_break(filename=bp.filename, line=bp.line,
temporary=bp.temporary, conditional=bp.conditional,
funcname=bp.funcname)
def setup(self, f, t):
pdb.Pdb.setup(self, f, t)
self.setInitialBreakpoints()
This piece of code would work for each breakpoint being passed as e.g. a named tuple. You could also experiment with just constructing bdb.Breakpoint instances directly, but I'm not sure if that would work properly, since bdb.Bdb maintains its own information about breakpoints.
Next, you'll need to create a new main method for your module which runs it the same way pdb runs. To some extent, you can copy the main method from pdb (and the if __name__ == '__main__' statement of course), but you'll need to augment it with some way to pass in the information about your additional breakpoints. What I'd suggest is writing the breakpoints to a temporary file from your IDE, and passing the name of that file as a second argument:
tmpfilename = ...
# write breakpoint info
p = subprocess.Popen(args=[sys.executable, '-m', 'mypdb', tmpfilename, ...], ...)
# delete the temporary file
Then in mypdb.main(), you would add something like this:
def main():
# code excerpted from pdb.main()
...
del sys.argv[0]
# add this
bpfilename = sys.argv[0]
with open(bpfilename) as f:
# read breakpoint info
breakpoints = ...
del sys.argv[0]
# back to excerpt from pdb.main()
sys.path[0] = os.path.dirname(mainpyfile)
pdb = Pdb(breakpoints) # modified
Now you can use your new debugger module just like you would use pdb, except that you don't have to explicitly send break commands before the process starts. This has the advantage that you can directly hook the standard input and output of the Python subprocess to your console, if it allows you to do that.
I often have the case that I'll be writing a script, and I'm up to a part of the script where I want to play around with some of the variables interactively. Getting to that part requires running a large part of the script I've already written.
In this case it isn't trivial to run this program from inside the shell. I would have to recreate the conditions of that function somehow.
What I want to do is call a function, like runshell(), which will run the python shell at that point in the program, keeping all variables in scope, allowing me to poke around in it.
How would I go about doing that?
import code
code.interact(local=locals())
But using the Python debugger is probably more what you want:
import pdb
pdb.set_trace()
By far the most convenient method that I have found is:
import IPython
IPython.embed()
You get all your global and local variables and all the creature comforts of IPython: tab completion, auto indenting, etc.
You have to install the IPython module to use it of course:
pip install ipython
For practicality I'd like to add that you can put the debugger trace in a one liner:
import pdb; pdb.set_trace()
Which is a nice line to add to an editor that supports snippets, like TextMate or Vim+SnipMate. I have it set up to expand "break" into the above one liner.
You can use the python debugger (pdb) set_trace function.
For example, if you invoke a script like this:
def whatever():
x = 3
import pdb
pdb.set_trace()
if __name__ == '__main__':
whatever()
You get the scope at the point when set_trace is called:
$ python ~/test/test.py
--Return--
> /home/jterrace/test/test.py(52)whatever()->None
-> pdb.set_trace()
(Pdb) x
3
(Pdb)
Not exactly a perfect source but I've written a few manhole's before, here is one I wrote for an abandoned pet project http://code.google.com/p/devdave/source/browse/pymethius/trunk/webmud/handlers/konsole.py
And here is one from the Twisted Library http://twistedmatrix.com/trac/browser/tags/releases/twisted-8.1.0/twisted/manhole/telnet.py the console logic is in Shell.doCommand