I'm using several Jupyter Notebooks to split the tasks between different modules. In my main notebook I call another module %run another_module.ipynb which loads all my data. However, it also plots and prints everything I have in another_module.ipynb.
I want to keep the plots in another_module.ipynb to help me visualise the data but I don't want to reprint everything when calling run another_module.ipynb. Is there an option to prevent priniting this?
Thanks
You could:
Override the print function and make it a no-op:
_print_function = print # create a backup in case you need it later
globals()["print"] = lambda *args, **kwargs: None
Run the file with the -i flag. Without -i, the file is run in a new namespace, so your modifications to the global variables are lost; with -i, the file is run in the current namespace.
%run -i another_module.ipynb
If you're using other methods to print logs (e.g., sys.stdout.write(), logging), it would be harder to create mocks for them. In that case, I would suggest redirecting the stdout or stderr pipe to /dev/null:
import os
import sys
sys.stdout = fopen(os.devnull, "w")
%run -i another_module.ipynb
Both methods are considered hacks and should only be used when you know the consequences. The better thing to do here is to change your code in the notebook, either to add a --verbose flag to control logging, or use some logging library (e.g., logging) that supports turning off logging entirely.
Related
I have two Python files (main.py and main_test.py). The file main_test.py is executed within main.py. When I do not use a log file this is what gets printed out:
Main file: 17:41:18
Executed file: 17:41:18
Executed file: 17:41:19
Executed file: 17:41:20
When I use a log file and execute main.py>log, then I get the following:
Executed file: 17:41:18
Executed file: 17:41:19
Executed file: 17:41:20
Main file: 17:41:18
Also, when I use python3 main.py | tee log to print out and log the output, it waits and prints out after finishing everything. In addition, the problem of reversing remains.
Questions
How can I fix the reversed print out?
How can I print out results simultaneously in terminal and log them in a correct order?
Python files for replication
main.py
import os
import time
import datetime
import pytz
python_file_name = 'main_test'+'.py'
time_zone = pytz.timezone('US/Eastern') # Eastern-Time-Zone
curr_time = datetime.datetime.now().replace(microsecond=0).astimezone(time_zone).time()
print(f'Main file: {curr_time}')
cwd = os.path.join(os.getcwd(), python_file_name)
os.system(f'python3 {cwd}')
main_test.py
import pytz
import datetime
import time
time_zone = pytz.timezone('US/Eastern') # Eastern-Time-Zone
for i in range(3):
curr_time = datetime.datetime.now().replace(microsecond=0).astimezone(time_zone).time()
print(f'Executed file: {curr_time}')
time.sleep(1)
When you run a script like this:
python main.py>log
The shell redirects output from the script to a file called log. However, if the script launches other scripts in their own subshell (which is what os.system() does), the output of that does not get captured.
What is surprising about your example is that you'd see anything at all when redirecting, since the output should have been redirected and no longer echo - so perhaps there's something you're leaving out here.
Also, tee waits for EOF on standard in, or for some error to occur, so the behaviour you're seeing there makes sense. This is intended behaviour.
Why bother with shells at all though? Why not write a few functions to call, and import the other Python module to call its functions? Or, if you need things to run in parallel (which they didn't in your example), look at multiprocessing.
In direct response to your questions:
"How can I fix the reversed print out?"
Don't use redirection, and write to file directly from the script, or ensure you use the same redirection when calling other scripts from the first (that will get messy), or capture the output from the subprocesses in the subshell and pipe it to the standard out of your main script.
"How can I print out results simultaneously in terminal and log them in a correct order?"
You should probably just do it in the script, otherwise this is not a really a Python question and you should try SuperUser or similar sites to see if there's some way to have tee or similar tools write through live.
In general though, unless you have really strong reasons to have the other functionality running in other shells, you should look at solving your problems in the Python script. And if you can't, use you can use something like Popen or derivatives to capture the subscript's output and do what you need instead of relying on tools that may or may not be available on the host OS running your script.
I want to define some globals in some number crunching work I am doing, I am incrementally writing the script and don't want previous result to keep being loaded/recalculated. One approach is to split out mature code into a separate file and only python run interactively new code. However I just want to do it in a single file for speed of development.
I was under the assumption that a global defined in a file would persist between invocations of run, but they do not.
So my script has the following chunk if code :
if globals().has_key('all_post_freq') != True:
print "creating all post freq var"
global all_post_freq
all_post_freq = all_post_freq_("pickle/all_post_freq.pickle")
How do I retain all_post_freq between invocations of ipython run
edit
ok I have split stuff up into files, but I know there must be a way of doing what I need to do :D
When you %run a file, it is normally started in a blank namespace, and its globals are added to the interactive namespace when it finishes. There's a -i flag which will run it directly in the interactive namespace, so it will see variables you've already defined:
%run -i myscript.py
I have a script named "poc.py" that takes one positional command line argument "inputfile.txt". The poc.py script uses argparse to handle the positional command line argument and then pass the args dict to main(). Once in main(), I read the input file, do some processing, create a pandas DataFrame, and finally plot the data. I am having difficulty manipulating my DataFrame and controlling the exact formatting of the resulting plot, so I'd like to try ipython to explore this interactively and see if I can get a better grasp on the "pythonic" ways to handle pandas/matplotlib.
So I tried experimenting with ipython and running a script, but I can't get ipython to keep my script's namespace.
I have tried this:
$ ipython --pylab -i poc.py inputfile.txt
Which runs my script just fine, and displays the plots (even without the blocking plt.show() call), but when the script is finished, the ipython who and whos commands say Interactive namespace is empty. Likewise if I first enter the ipython shell and then do:
In [2]: run poc.py inputfile.txt
when the script is done (again, plenty of output, plots show up) I get the same result: an empty interactive namespace.
What am I missing in terms of understanding how to run an external and use ipython to interactively explore the data/objects in my script?
Here is a barebones example of how my script (poc.py) is setup:
import numpy as np
import matplotlib as plt
import pandas as pd
# etc ...more libraries and custom functions here...
def main(args):
data = np.genfromtxt(args.inputfile)
# (omitted)...more data processing / manipulation...
pdata = pd.DataFrame(data)
# (omitted)...more data processing / manipulation...
plt.plot(pdata)
# (omitted)...some formatting of the matplotlib/axes/figure objects
plt.show()
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='''some program...''')
parser.add_argument('inputfile', help='path to input file')
args = parser.parse_args()
main(args)
Answering myself here. I am aware that this could be a comment, but I decided the description warrants a standalone answer.
Basically after spending some time to understand the issues everyone mentioned about my variables going out of scope, and different ways to handle that, I found another solution that worked for me. I ended up using the embed() function from IPython. For the debugging process, I would add
...
from IPython import embed
embed()
...
at the point in the script where I wanted to stop and look around. Once dropped into the IPython shell, I could investigate variable dimensions and experiment with manipulating things. When I found the combo I wanted, I would copy the commands, drop out of the interactive interpreter and modify the script. The reason this worked for me is it did not involve modifying the structure of the program simply to get debugging information.
The problem here is that data and pdata aren't in your script's namespace; they're in the local namespace of a function that's already run and completed.
If you want to be able to inspect them after the fact, you'll need to store them somewhere. For example:
# ...
plt.show()
return data, pdata, plt
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='''some program...''')
parser.add_argument('inputfile', help='path to input file')
args = parser.parse_args()
data, pdata, plt = main(args)
(Alternatively, you could just make all of main's variables global, but this way seems cleaner.)
Now, your script's namespace includes variables named data, pdata, and plt that have the values you want. Plus, you can call main again and pass it a different file and get back the values from that file.
Question can be related to Use python subprocess module like a command line simulator
I have written some infrastructure code called my_shell to which you can pass shell commands of my application that looks like this
class ApplicationTestShell(object):
def __init__(self):
'''
Constructor
'''
self.play_ground_dir = "/var/tmp/MyAppDir"
ensure_dir_exists_and_empty(self.play_ground_dir)
def execute_command(self, command, on_success = None, on_failure = None):
p = create_shell_process(self, self.play_ground_dir)
sout, serr = p.communicate(input = command)
if p.returncode == 0:
on_success(sout)
else:
on_failure(serr)
def create_shell_process(self, cwd):
return Popen("/bin/bash", env= {WHAT DO I DO HERE?},cwd = test_dir, stdout=PIPE, stderr=PIPE, stdin=PIPE)
The interesting bit to me here is the env parameter. Python expects like a 'map' datastructure of all environment variable. My application requires several variables exported and set. The script for setting and exporting is generated by running say '/bin/appload myapp' (Assume appload is always available on the path). What I do currently
is when I call p.communicate I do the following
p.communicate(input = "eval `/bin/appload myapp`;" + command)
So basically before running the command I call the infrastructure setup.
Is there any way to do this in a better fashion in Python. I somehow want to push the eval /bin/appload part to the env parameter on the Popen class OR as part of the shell creation process.
What are the problems with my current implementation? (I feel it is hacky but I may be wrong)
It depends on how /bin/appload myapp works. If it only guarantees that it will output bash syntax, then parsing that output in Python in order to construct the environment object there is almost certainly more trouble than it's worth (you might need to support parameter and variable expansion, subshells, process substitution, etc, etc). On the other hand, if you are sure that /bin/appload myapp will only ever output lines of the form "VARIABLENAME=someword", then that's pretty trivial to parse in Python and you could move it into your Python code if you like.
There are an awful lot of different directions you could go with these requirements; you could capture the output of appload myapp into a tempfile and set the subprocess's $BASH_ENV to that filename; that would cause the shell to source your environment setup before running your command in a way that some might consider cleaner. You could give your command (with the eval-ing prefix) as the first argument to Popen and pass shell=True, and let Popen do the bash invocation on its own (setting $SHELL explicitly to bash if necessary). You could use bash's -c option to specify the code to run on the command line rather than via stdin. You could have a multi-tiered approach by invoking a shell from Python which eval's the appload myapp environment and then exec's another shell underneath it, so that the first doesn't show up in ps listings and the command given to create_shell_process has the shell all to itself (although that shouldn't really matter). You could do a lot of things, depending on what your concerns are with respect to how the shell is invoked, how it looks in ps listings, whether you want your command to still be run if the appload myapp output produces an error when eval'd, etc. But for a general solution, I think what you have is perfectly fine.
I don't see any real problems with the implementation, besides cosmetic things or minor things that probably only came from copying and pasting the code: create_shell_process doesn't use its cwd parameter, and the on_success and on_failure parameters look like they're optional but the defaults will break things (you can't call None).
I'm writing a little python IDE, and I want to add simple debugging. I don't need all the features of winpdb.
How do I launch a python program (by file name) with a breakpoint set at a line number so that it runs until that line number and halts?
Note that I don't want to do this from the command-line, and I don't want to edit the source (by inserting set_trace, for example). And I don't want it to stop at the first line so I have to run the debugger from there. I've tried all the obvious ways with pdb and bdb, but I must be missing something.
Pretty much the only viable way to do it (as far as I know) is to run Python as a subprocess from within your IDE. This avoids "pollution" from the current Python interpreter, which makes it fairly likely that the program will run in the same way as if you had started it independently. (If you have issues with this, check the subprocess environment.) In this manner, you can run a script in "debug mode" using
p = subprocess.Popen(args=[sys.executable, '-m', 'pdb', 'scriptname.py', 'arg1'],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
This will start up Python at the debugger prompt. You'll need to run some debugger commands to set breakpoints, which you can do like so:
o,e = p.communicate('break scriptname.py:lineno')
If this works, o should be the normal output of the Python interpreter after it sets a breakpoint, and e should be empty. I'd suggest you play around with this and add some checks in your code to ensure whether the breakpoints were properly set.
After that, you can start the program running with
p.communicate('continue')
At this point you'd probably want to hook the input, output, and error streams up to the console that you're embedding in your IDE. You would probably need to do this with an event loop, roughly like so:
while p.returncode is None:
o,e = p.communicate(console.read())
console.write(o)
console.write(e)
You should consider that snippet to be effectively pseudocode, since depending on how exactly your console works, it'll probably take some tinkering to get it right.
If this seems excessively messy, you can probably simplify the process a bit using the features of Python's pdb and bdb modules (I'm guessing "Python debugger" and basic debugger" respectively). The best reference on how to do this is the source code of the pdb module itself. Basically, the way the responsibilities of the modules are split is that bdb handles "under the hood" debugger functionality, like setting breakpoints, or stopping and restarting execution; pdb is a wrapper around this that handles user interaction, i.e. reading commands and displaying output.
For your IDE-integrated debugger, it would make sense to adjust the behavior of the pdb module in two ways that I can think of:
have it automatically set breakpoints during initialization, without you having to explicity send the textual commands to do so
make it take input from and send output to your IDE's console
Just these two changes should be easy to implement by subclassing pdb.Pdb. You can create a subclass whose initializer takes a list of breakpoints as an additional argument:
class MyPDB(pdb.Pdb):
def __init__(self, breakpoints, completekey='tab',
stdin=None, stdout=None, skip=None):
pdb.Pdb.__init__(self, completekey, stdin, stdout, skip)
self._breakpoints = breakpoints
The logical place to actually set up the breakpoints is just after the debugger reads its .pdbrc file, which occurs in the pdb.Pdb.setup method. To perform the actual setup, use the set_break method inherited from bdb.Bdb:
def setInitialBreakpoints(self):
_breakpoints = self._breakpoints
self._breakpoints = None # to avoid setting breaks twice
for bp in _breakpoints:
self.set_break(filename=bp.filename, line=bp.line,
temporary=bp.temporary, conditional=bp.conditional,
funcname=bp.funcname)
def setup(self, f, t):
pdb.Pdb.setup(self, f, t)
self.setInitialBreakpoints()
This piece of code would work for each breakpoint being passed as e.g. a named tuple. You could also experiment with just constructing bdb.Breakpoint instances directly, but I'm not sure if that would work properly, since bdb.Bdb maintains its own information about breakpoints.
Next, you'll need to create a new main method for your module which runs it the same way pdb runs. To some extent, you can copy the main method from pdb (and the if __name__ == '__main__' statement of course), but you'll need to augment it with some way to pass in the information about your additional breakpoints. What I'd suggest is writing the breakpoints to a temporary file from your IDE, and passing the name of that file as a second argument:
tmpfilename = ...
# write breakpoint info
p = subprocess.Popen(args=[sys.executable, '-m', 'mypdb', tmpfilename, ...], ...)
# delete the temporary file
Then in mypdb.main(), you would add something like this:
def main():
# code excerpted from pdb.main()
...
del sys.argv[0]
# add this
bpfilename = sys.argv[0]
with open(bpfilename) as f:
# read breakpoint info
breakpoints = ...
del sys.argv[0]
# back to excerpt from pdb.main()
sys.path[0] = os.path.dirname(mainpyfile)
pdb = Pdb(breakpoints) # modified
Now you can use your new debugger module just like you would use pdb, except that you don't have to explicitly send break commands before the process starts. This has the advantage that you can directly hook the standard input and output of the Python subprocess to your console, if it allows you to do that.