I'm trying to write python code in my jupyter notebook, that will export the whole current notebook to html/pdf.
I know I can do something like that:
import subprocess
subprocess.call("jupyter nbconvert notebook.ipynb")
But that requires the file name, which is not known inside the notebook.
I saw many "hacks" to get the notebook file name, like this one:
How do I get the current IPython Notebook name
but I prefer to find a better solution.
Is there a smart way to export the current running notebook to a html/pdf file?
Thanks.
You can use the library papermill, that will execute the notebook with ingested parameters (basically your notebook name that you define outside your notebook)
Python program
pm.execute_notebook(
'notebooks/notebookA.ipynb', # notebook to execute
'notebooks/temp.ipynb', # temporary notebook that will contains the outputs and will be used to save HTML
report_mode=True, # To hide ingested parameters cells, but not working for me
parameters=dict(filename_ipynb='notebooks/temp.ipynb',
filename_html='output/notebookA.html')
)
/notebooks/notebookA.ipynb
(To insert as the last cell of your notebook)
import os
os.system('jupyter nbconvert --output ../' + filename_html + ' --to html ' + filename_ipynb)
Just a thing:
it is adding a cell to set the parameters ingested. It should be able to hide it with report_mode=False but somehow it doesn't work, even though it should: https://github.com/nteract/papermill/issues/253
Related
I only want certain cells and certain cell outputs to show up when I export my Juypter Notebook from VSCode. I have not been able to get an answer that works from Google, StackOverflow, and ChatGPT.
So when I export the .ipynb file to HTML in VSCode, how do I modify which cells are included in the HTML and which are not? For example, what would I do to include just the ouptut of the cell below and not the actual code?
import pandas as pd
import seaborn as sns
df = pd.read_csv(file.csv)
sns.histplot(df['Variable 1']
This post seems to indicate the best/only option is tagging cells then removing them with nbconvert. This seems inefficient in VSCode, especially compared to the easy output = FALSE or echo = FALSE in RStudio.
This seems like it should be an easy and common question but I am getting no good solutions from the internet. ChatGPT suggested include #hide-in-export to the cells I didn't want but that didn't work
The StackOverflow post I linked suggested using TagRemovePreprocessor with nbconvert and marking all the cells I want gone but that seems so clunky. Follow-up question: If tagging cells and removing them in export with nbconvert, what is the fastest way to tag cells in VSCode?
Although it is still a bit cumbersome, I think it is still a feasible method. Use F12 to open the web background, delete cells or output cells.
I still don't know if there is an easier way but here is what I have done with help from ChatGPT, this blog post, and this StackOverflow answer.
First, have a function that adds cell tags to the certain cells you want to hide:
import json
def add_cell_tag(nb_path, tag, cell_indices):
# Open the .ipynb file
with open(nb_path, 'r', encoding='utf-8') as f:
nb = json.load(f)
# Get the cells from the notebook
cells = nb['cells']
# Add the tag to the specified cells
for index in cell_indices:
cell = cells[index]
if 'metadata' not in cell:
cell['metadata'] = {}
if 'tags' not in cell['metadata']:
cell['metadata']['tags'] = []
cell['metadata']['tags'].append(tag)
# Save the modified notebook
with open(nb_path, 'w', encoding='utf-8') as f:
json.dump(nb, f)
Second, run the function and add a tag (can be any string) to the cells you want to hide in the HTML export:
add_cell_tag(nb_path, 'hide-code', [0, 1, 2])
Finally, use nbconvert in the terminal to export and filter the notebook:
jupyter nbconvert --to html --TagRemovePreprocessor.remove_cell_tags=hide-code path/to/notebook.ipynb
The cells made be entirely removed or just the output or just the input:
TagRemovePreprocessor.remove_input_tags
TagRemovePreprocessor.remove_single_output_tags
TagRemovePreprocessor.remove_all_outputs_tags
Not sure the difference between those last two. Additionally, I had a helper function to count the cells in the notebook and one to clear all tags in the notebook.
I have successfully replicated the logging methodology by #Mercury in this post: Reconnecting remote Jupyter Notebook and get current cell output
Namely, adding this code chunk to my notebook:
import sys
import logging
nblog = open("nb.log", "a+")
sys.stdout.echo = nblog
sys.stderr.echo = nblog
get_ipython().log.handlers[0].stream = nblog
get_ipython().log.setLevel(logging.INFO)
My main edit to that code is replacing a+ with w+ because I want to overwrite the log file every time I rerun my notebook.
However, I would like my logger to include information from cell outputs that aren't explicitly printed. For example, if I do head(df) in a cell instead of print(head(df)). Is that possible?
Thanks!
h2o_model.accuracy prints model validation data when executed in a Jupyter Notebook cell (which is desirable, despite the function name). How to save this whole validation output (entire notebook cell contents) to a file? Please test before suggesting redirections.
I'd be careful using %%capture, it doesn't capture html content (tables) in the stdout.
The redirect_stdout works flawlessly when used from python CLI/script. IPython/Jupyter might cause issues with tables as they are displayed not printed. Note that you should not use .readlines() to get the results from StringIO - use .getvalue().
You can use h2o_model.save_model_details(path) to persist information about the model to a json file (which might serve you better in a long run but it's not really human readable).
If you really want to have the output that looks like what would you get from a Jupyter notebook, you can use the following hack:
create a template jupyter notebook that contains:
import os
import h2o
h2o.connect(verbose=False)
h2o.get_model(os.environ["H2O_MODEL"])
and in your original notebook add
!H2O_MODEL={h2o_model.key} jupyter nbconvert --to html --execute template.ipynb --output={h2o_model.key}_results.html
You can also create a template for the nbconvert to hide the code cells.
You should call h2o_model.accuracy() (note the parentheses). The reason the whole model gets printed is non-idiomatic implementation of __repl__ in h2o models which prints rather then returning a string (there's a JIRA to fix that).
If you encounter some other situation where you would like to save printed output of some command, you can use redirect_stdout[1] to capture it (assuming you have python 3.4+).
[1] https://docs.python.org/3.9/library/contextlib.html#contextlib.redirect_stdout
Ok, so only the h2o_model.accuracy output cannot be captured, while xgb_model.cross_validation_metrics_summary or even h2o_model alone can - e.g. like that:
%%capture captured_output
# print model validation
# data to `captured_output`
xgb_model
In another notebook cell:
# print(captured_output.stdout.replace("\n\n","\n"))
with open(filename, 'w') as f:
f.write((captured_output.stdout.replace("\n\n","\n")))
I want to load (i.e., copy the code as with %load) the code from a code cell in one jupyter notebook into another jupyter notebook (Jupyter running Python, but not sure if that matters). I would really like to enter something like
%load cell[5] notebookname.ipynb
The command copies all code in cell 5 of notebookname.ipynb to the code cell of the notebook I am working on. Does anybody know a trick how to do that?
Adapting some code found here at Jupyter Notebook, the following will display the code of a specific cell in the specified notebook:
import io
from nbformat import read
def print_cell_code(fname, cellno):
with io.open(fname, 'r', encoding='utf-8') as f:
nb = read(f, 4)
cell = nb.cells[cellno]
print(cell.source)
print_cell_code("Untitled.ipynb",2)
Not sure what you want to do once the code is there, but maybe this can be adapted to suit your needs. Try print(nb.cells) to see what read brings in.
You'll probably want to use or write your own nbconvert preprocessor to extract a cell from one and insert into another. There is a good amount research into these docs it takes to understand how to write your preprocessor, but this is the preferred way.
The quick fix option you have is that the nbformat specification is predicated on JSON, which means that if you read in a ipynb file with pure python (ie with open and read), you can call json.loads on it to turn the entire file into a dict. From there, you can access cells in the cells entry (which is a list of cells). So, something like like this:
import json
with open("nb1.ipynb", "r") as nb1, open("nb2.ipynb", "r") as nb2:
nb1, nb2 = json.loads(nb1.read()), json.loads(nb2.read())
nb2["cells"].append(nb1["cells"][0]) # adds nb1's first cell to end of nb2
This assumes (as does your question) there is no metadata conflict between the notebooks.
I am a little new to Python, and I have been using the Jupyter Notebook through Anaconda. I am trying to import a csv file to make a DataFrame, but I am unable to import the file.
Here is an attempt using the local method:
df = pd.read_csv('Workbook1')
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-11-a2deb4e316ab> in <module>()
----> 1 df = pd.read_csv('Workbook1')
After that I tried using the path (I put user for my username)
df = pd.read_csv('Users/user/Desktop/Workbook1.csv')
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-13-3f2bedd6c4de> in <module>()
----> 1 df = pd.read_csv('Users/user/Desktop/Workbook1.csv')
I am using a Mac, which I am also new to, and I am not 100% sure if I am correctly importing the right path. Can anyone offer some insight or solutions that would allow me to open this csv file.
Instead of providing path, you can set a path using the code below:
import os
import pandas as pd
os.chdir("D:/dataset")
data = pd.read_csv("workbook1.csv")
This will surely work.
Are you sure that the file exists in the location you are specifying to the pandas read_csv method? You can check using the os python built in module:
import os
os.path.isfile('/Users/user/Desktop/Workbook1.csv')
Another way of checking if the file of interest is in the current working directory within a Jupyter notebook is by running ls -l within a cell:
ls -l
I think the problem is probably in the location of the file:
df1 = pd.read_csv('C:/Users/owner/Desktop/contacts.csv')
Having done that, now you can play around with the big file if you have, and create useful data with:
df1.head()
The OS module in python provides functions for interacting with the operating system. OS, comes under Python’s standard utility modules.
import os
import pandas as pd
os.chdir("c:\Pandas")
df=pd.read_csv("names.csv")
df
This might help. :)
The file name is case sensitive, so check your case.
I had the same problem on a Mac too and for some reason it only happened to me there. And I tried to use many tricks but nothing works. I recommend you go directly to the file, right click and then press “alt” key after that the option to “copy route” will appear, and just paste it into your jupyter. For some reason that worked to me.
I believe the issue is that you're not using fully qualified paths. Try this:
Move the data into a suitable project directory. You can do this using the %%bash Magic commands.
%%bash
mkdir -p /project/data/
cp data.csv /project/data/data.csv
You can read the file
f = open("/project/data/data.csv","r")
print(f.read())
f.close()
But it might be most useful to load it into a library.
import pandas as pd
data = pd.read_csv("/project/data/data.csv")
I’ve created a runnable Jupyter notebook with more details here: Jupyter Basics: Reading Files.
Try double quotes, instead of single quotes. it worked for me.
you can open csv files in Jupyter notebook by following these easy steps-
Step 1 - create a directory or folder (you can also use old created folder)
Step 2 - Change your Jupyter working directory to that created directory -
import os
os.chdir('D:/datascience/csvfiles')
Step 3 - Now your directory is changed in Jupyter Notebook. Store your file(s) in that directory.
Step 4 - Open your file -
import pandas as pd
df = pd.read_csv("workbook1.csv")
Now your file is read and stored in a Data Frame variable df, you can display this file content by following
df.head() - display first five rows of this file
df - display all rows of this file
Happy Data Science!
There was a similar problem for me while reading a CSV file in Jupyter notebook from the computer.
I solved it by substituting the "" symbol with "/" in the path like this.
This is what I had:
"C:\Users\RAJ\Desktop\HRPrediction\HRprediction.csv"
This is what I changed it for:
"C:/Users/RAJ/Desktop/HRPrediction/HRprediction.csv".
This is what worked for me. I am using Mac OS.
Save your CSV on a separate folder on your desktop.
When opening a Jupyter notebook press on the same folder that your dataset is currently saved in. Press new notebook in the upper right hand corner.
After opening a new notebook. Code as per usual and read your data using import pandas as pd and pd.read_csv calling to your dataset.
No need to use anything extra just use r in front of the location.
df = pd.read_csv(r'C:/Users/owner/Desktop/contacts.csv'