Saving `h2o_model.accuracy` printed output to a file - python

h2o_model.accuracy prints model validation data when executed in a Jupyter Notebook cell (which is desirable, despite the function name). How to save this whole validation output (entire notebook cell contents) to a file? Please test before suggesting redirections.

I'd be careful using %%capture, it doesn't capture html content (tables) in the stdout.
The redirect_stdout works flawlessly when used from python CLI/script. IPython/Jupyter might cause issues with tables as they are displayed not printed. Note that you should not use .readlines() to get the results from StringIO - use .getvalue().
You can use h2o_model.save_model_details(path) to persist information about the model to a json file (which might serve you better in a long run but it's not really human readable).
If you really want to have the output that looks like what would you get from a Jupyter notebook, you can use the following hack:
create a template jupyter notebook that contains:
import os
import h2o
h2o.connect(verbose=False)
h2o.get_model(os.environ["H2O_MODEL"])
and in your original notebook add
!H2O_MODEL={h2o_model.key} jupyter nbconvert --to html --execute template.ipynb --output={h2o_model.key}_results.html
You can also create a template for the nbconvert to hide the code cells.

You should call h2o_model.accuracy() (note the parentheses). The reason the whole model gets printed is non-idiomatic implementation of __repl__ in h2o models which prints rather then returning a string (there's a JIRA to fix that).
If you encounter some other situation where you would like to save printed output of some command, you can use redirect_stdout[1] to capture it (assuming you have python 3.4+).
[1] https://docs.python.org/3.9/library/contextlib.html#contextlib.redirect_stdout

Ok, so only the h2o_model.accuracy output cannot be captured, while xgb_model.cross_validation_metrics_summary or even h2o_model alone can - e.g. like that:
%%capture captured_output
# print model validation
# data to `captured_output`
xgb_model
In another notebook cell:
# print(captured_output.stdout.replace("\n\n","\n"))
with open(filename, 'w') as f:
f.write((captured_output.stdout.replace("\n\n","\n")))

Related

How do I only show certain cells input or output when exporting a Juypter Notebook from VSCode?

I only want certain cells and certain cell outputs to show up when I export my Juypter Notebook from VSCode. I have not been able to get an answer that works from Google, StackOverflow, and ChatGPT.
So when I export the .ipynb file to HTML in VSCode, how do I modify which cells are included in the HTML and which are not? For example, what would I do to include just the ouptut of the cell below and not the actual code?
import pandas as pd
import seaborn as sns
df = pd.read_csv(file.csv)
sns.histplot(df['Variable 1']
This post seems to indicate the best/only option is tagging cells then removing them with nbconvert. This seems inefficient in VSCode, especially compared to the easy output = FALSE or echo = FALSE in RStudio.
This seems like it should be an easy and common question but I am getting no good solutions from the internet. ChatGPT suggested include #hide-in-export to the cells I didn't want but that didn't work
The StackOverflow post I linked suggested using TagRemovePreprocessor with nbconvert and marking all the cells I want gone but that seems so clunky. Follow-up question: If tagging cells and removing them in export with nbconvert, what is the fastest way to tag cells in VSCode?
Although it is still a bit cumbersome, I think it is still a feasible method. Use F12 to open the web background, delete cells or output cells.
I still don't know if there is an easier way but here is what I have done with help from ChatGPT, this blog post, and this StackOverflow answer.
First, have a function that adds cell tags to the certain cells you want to hide:
import json
def add_cell_tag(nb_path, tag, cell_indices):
# Open the .ipynb file
with open(nb_path, 'r', encoding='utf-8') as f:
nb = json.load(f)
# Get the cells from the notebook
cells = nb['cells']
# Add the tag to the specified cells
for index in cell_indices:
cell = cells[index]
if 'metadata' not in cell:
cell['metadata'] = {}
if 'tags' not in cell['metadata']:
cell['metadata']['tags'] = []
cell['metadata']['tags'].append(tag)
# Save the modified notebook
with open(nb_path, 'w', encoding='utf-8') as f:
json.dump(nb, f)
Second, run the function and add a tag (can be any string) to the cells you want to hide in the HTML export:
add_cell_tag(nb_path, 'hide-code', [0, 1, 2])
Finally, use nbconvert in the terminal to export and filter the notebook:
jupyter nbconvert --to html --TagRemovePreprocessor.remove_cell_tags=hide-code path/to/notebook.ipynb
The cells made be entirely removed or just the output or just the input:
TagRemovePreprocessor.remove_input_tags
TagRemovePreprocessor.remove_single_output_tags
TagRemovePreprocessor.remove_all_outputs_tags
Not sure the difference between those last two. Additionally, I had a helper function to count the cells in the notebook and one to clear all tags in the notebook.

Alternative ways of logging cell output in jupyterLab

I have successfully replicated the logging methodology by #Mercury in this post: Reconnecting remote Jupyter Notebook and get current cell output
Namely, adding this code chunk to my notebook:
import sys
import logging
nblog = open("nb.log", "a+")
sys.stdout.echo = nblog
sys.stderr.echo = nblog
get_ipython().log.handlers[0].stream = nblog
get_ipython().log.setLevel(logging.INFO)
My main edit to that code is replacing a+ with w+ because I want to overwrite the log file every time I rerun my notebook.
However, I would like my logger to include information from cell outputs that aren't explicitly printed. For example, if I do head(df) in a cell instead of print(head(df)). Is that possible?
Thanks!

load code from a code cell from one jupyter notebook into another jupyter notebook

I want to load (i.e., copy the code as with %load) the code from a code cell in one jupyter notebook into another jupyter notebook (Jupyter running Python, but not sure if that matters). I would really like to enter something like
%load cell[5] notebookname.ipynb
The command copies all code in cell 5 of notebookname.ipynb to the code cell of the notebook I am working on. Does anybody know a trick how to do that?
Adapting some code found here at Jupyter Notebook, the following will display the code of a specific cell in the specified notebook:
import io
from nbformat import read
def print_cell_code(fname, cellno):
with io.open(fname, 'r', encoding='utf-8') as f:
nb = read(f, 4)
cell = nb.cells[cellno]
print(cell.source)
print_cell_code("Untitled.ipynb",2)
Not sure what you want to do once the code is there, but maybe this can be adapted to suit your needs. Try print(nb.cells) to see what read brings in.
You'll probably want to use or write your own nbconvert preprocessor to extract a cell from one and insert into another. There is a good amount research into these docs it takes to understand how to write your preprocessor, but this is the preferred way.
The quick fix option you have is that the nbformat specification is predicated on JSON, which means that if you read in a ipynb file with pure python (ie with open and read), you can call json.loads on it to turn the entire file into a dict. From there, you can access cells in the cells entry (which is a list of cells). So, something like like this:
import json
with open("nb1.ipynb", "r") as nb1, open("nb2.ipynb", "r") as nb2:
nb1, nb2 = json.loads(nb1.read()), json.loads(nb2.read())
nb2["cells"].append(nb1["cells"][0]) # adds nb1's first cell to end of nb2
This assumes (as does your question) there is no metadata conflict between the notebooks.

Jupyter - export current ipynb file to html within the notebook

I'm trying to write python code in my jupyter notebook, that will export the whole current notebook to html/pdf.
I know I can do something like that:
import subprocess
subprocess.call("jupyter nbconvert notebook.ipynb")
But that requires the file name, which is not known inside the notebook.
I saw many "hacks" to get the notebook file name, like this one:
How do I get the current IPython Notebook name
but I prefer to find a better solution.
Is there a smart way to export the current running notebook to a html/pdf file?
Thanks.
You can use the library papermill, that will execute the notebook with ingested parameters (basically your notebook name that you define outside your notebook)
Python program
pm.execute_notebook(
'notebooks/notebookA.ipynb', # notebook to execute
'notebooks/temp.ipynb', # temporary notebook that will contains the outputs and will be used to save HTML
report_mode=True, # To hide ingested parameters cells, but not working for me
parameters=dict(filename_ipynb='notebooks/temp.ipynb',
filename_html='output/notebookA.html')
)
/notebooks/notebookA.ipynb
(To insert as the last cell of your notebook)
import os
os.system('jupyter nbconvert --output ../' + filename_html + ' --to html ' + filename_ipynb)
Just a thing:
it is adding a cell to set the parameters ingested. It should be able to hide it with report_mode=False but somehow it doesn't work, even though it should: https://github.com/nteract/papermill/issues/253

print data_frame.head gives output not as a nice table

I have a data set taken from kaggle, and I want to get the result shown here
So, I took that code, changed it a bit and what I ran is this:
# get titanic & test csv files as a DataFrame
titanic_df = pd.read_csv("./input/train.csv")
test_df = pd.read_csv("./input/test.csv")
# preview the data
print titanic_df.head()
This works, as it outputs the right data, but not as neatly as in the tutorial... Can I make it right?
Here is my output (Python 2, Spyder):
Try using Jupyter notebook if you have not used it before. In ipython console, it will wrap the text and show it in multiple lines. In kaggle, what you are seeing is itself a jupyter notebook.

Categories