I wanted to repeat an analysis included in a large jupyter notebook and save a couple of the generated dataframes in a csv file. My first attempt was to open the notebook in jupyter-lab and interactively run the cells I was interested, then add a new cell with my code that saves the relevant data to files.
I was wondering however if I could do the same thing with a script, e.g. run the 1st and 45th cell of the notebook and then run a few more commands to export the data.
cell 0
import IPython.display
cell 1
x = 1
cell 2
x = x + 1
print(x)
cell 3
IPython.display.Javascript("Jupyter.notebook.execute_cells([2])")
Related
Is it possible for a Jupyter Notebook cell to execute another cell programmatically? (i.e. using Python)
And if so, is it possible to specify the cell number to execute?
There is a javascript function called execute_cells (see it on Github) that when given an list of cell indices runs those cells.
%%javascript
Jupyter.notebook.execute_cells([0]) # 0 to run first cell in notebook etc.
If you need to run it specifically in a Python code cell, one can use the Javascript function in the IPython.display module to execute javascript
from IPython.display import Javascript
Javascript("Jupyter.notebook.execute_cells([2])")
Note that this will move the cursor to executed cells. If you wish to get back to the cursor position, you may get the number of the next cell in order to execute it (code adapted from this answer) :
%%javascript
Jupyter.notebook.execute_cells([0]) # 0 to run first cell in notebook etc.
var output_area = this;
// find my cell element
var cell_element = output_area.element.parents('.cell');
// which cell is it?
var cell_idx = Jupyter.notebook.get_cell_elements().index(cell_element);
Jupyter.notebook.execute_cells([cell_idx+1]) # execute next cell
I would like to add to the answer #Louise Davies. If you want to execute a range of cells, use:
from IPython.display import Javascript
Javascript("Jupyter.notebook.execute_cell_range(10,20)")
I only want certain cells and certain cell outputs to show up when I export my Juypter Notebook from VSCode. I have not been able to get an answer that works from Google, StackOverflow, and ChatGPT.
So when I export the .ipynb file to HTML in VSCode, how do I modify which cells are included in the HTML and which are not? For example, what would I do to include just the ouptut of the cell below and not the actual code?
import pandas as pd
import seaborn as sns
df = pd.read_csv(file.csv)
sns.histplot(df['Variable 1']
This post seems to indicate the best/only option is tagging cells then removing them with nbconvert. This seems inefficient in VSCode, especially compared to the easy output = FALSE or echo = FALSE in RStudio.
This seems like it should be an easy and common question but I am getting no good solutions from the internet. ChatGPT suggested include #hide-in-export to the cells I didn't want but that didn't work
The StackOverflow post I linked suggested using TagRemovePreprocessor with nbconvert and marking all the cells I want gone but that seems so clunky. Follow-up question: If tagging cells and removing them in export with nbconvert, what is the fastest way to tag cells in VSCode?
Although it is still a bit cumbersome, I think it is still a feasible method. Use F12 to open the web background, delete cells or output cells.
I still don't know if there is an easier way but here is what I have done with help from ChatGPT, this blog post, and this StackOverflow answer.
First, have a function that adds cell tags to the certain cells you want to hide:
import json
def add_cell_tag(nb_path, tag, cell_indices):
# Open the .ipynb file
with open(nb_path, 'r', encoding='utf-8') as f:
nb = json.load(f)
# Get the cells from the notebook
cells = nb['cells']
# Add the tag to the specified cells
for index in cell_indices:
cell = cells[index]
if 'metadata' not in cell:
cell['metadata'] = {}
if 'tags' not in cell['metadata']:
cell['metadata']['tags'] = []
cell['metadata']['tags'].append(tag)
# Save the modified notebook
with open(nb_path, 'w', encoding='utf-8') as f:
json.dump(nb, f)
Second, run the function and add a tag (can be any string) to the cells you want to hide in the HTML export:
add_cell_tag(nb_path, 'hide-code', [0, 1, 2])
Finally, use nbconvert in the terminal to export and filter the notebook:
jupyter nbconvert --to html --TagRemovePreprocessor.remove_cell_tags=hide-code path/to/notebook.ipynb
The cells made be entirely removed or just the output or just the input:
TagRemovePreprocessor.remove_input_tags
TagRemovePreprocessor.remove_single_output_tags
TagRemovePreprocessor.remove_all_outputs_tags
Not sure the difference between those last two. Additionally, I had a helper function to count the cells in the notebook and one to clear all tags in the notebook.
I want to load (i.e., copy the code as with %load) the code from a code cell in one jupyter notebook into another jupyter notebook (Jupyter running Python, but not sure if that matters). I would really like to enter something like
%load cell[5] notebookname.ipynb
The command copies all code in cell 5 of notebookname.ipynb to the code cell of the notebook I am working on. Does anybody know a trick how to do that?
Adapting some code found here at Jupyter Notebook, the following will display the code of a specific cell in the specified notebook:
import io
from nbformat import read
def print_cell_code(fname, cellno):
with io.open(fname, 'r', encoding='utf-8') as f:
nb = read(f, 4)
cell = nb.cells[cellno]
print(cell.source)
print_cell_code("Untitled.ipynb",2)
Not sure what you want to do once the code is there, but maybe this can be adapted to suit your needs. Try print(nb.cells) to see what read brings in.
You'll probably want to use or write your own nbconvert preprocessor to extract a cell from one and insert into another. There is a good amount research into these docs it takes to understand how to write your preprocessor, but this is the preferred way.
The quick fix option you have is that the nbformat specification is predicated on JSON, which means that if you read in a ipynb file with pure python (ie with open and read), you can call json.loads on it to turn the entire file into a dict. From there, you can access cells in the cells entry (which is a list of cells). So, something like like this:
import json
with open("nb1.ipynb", "r") as nb1, open("nb2.ipynb", "r") as nb2:
nb1, nb2 = json.loads(nb1.read()), json.loads(nb2.read())
nb2["cells"].append(nb1["cells"][0]) # adds nb1's first cell to end of nb2
This assumes (as does your question) there is no metadata conflict between the notebooks.
I'm trying to write python code in my jupyter notebook, that will export the whole current notebook to html/pdf.
I know I can do something like that:
import subprocess
subprocess.call("jupyter nbconvert notebook.ipynb")
But that requires the file name, which is not known inside the notebook.
I saw many "hacks" to get the notebook file name, like this one:
How do I get the current IPython Notebook name
but I prefer to find a better solution.
Is there a smart way to export the current running notebook to a html/pdf file?
Thanks.
You can use the library papermill, that will execute the notebook with ingested parameters (basically your notebook name that you define outside your notebook)
Python program
pm.execute_notebook(
'notebooks/notebookA.ipynb', # notebook to execute
'notebooks/temp.ipynb', # temporary notebook that will contains the outputs and will be used to save HTML
report_mode=True, # To hide ingested parameters cells, but not working for me
parameters=dict(filename_ipynb='notebooks/temp.ipynb',
filename_html='output/notebookA.html')
)
/notebooks/notebookA.ipynb
(To insert as the last cell of your notebook)
import os
os.system('jupyter nbconvert --output ../' + filename_html + ' --to html ' + filename_ipynb)
Just a thing:
it is adding a cell to set the parameters ingested. It should be able to hide it with report_mode=False but somehow it doesn't work, even though it should: https://github.com/nteract/papermill/issues/253
I have a data set taken from kaggle, and I want to get the result shown here
So, I took that code, changed it a bit and what I ran is this:
# get titanic & test csv files as a DataFrame
titanic_df = pd.read_csv("./input/train.csv")
test_df = pd.read_csv("./input/test.csv")
# preview the data
print titanic_df.head()
This works, as it outputs the right data, but not as neatly as in the tutorial... Can I make it right?
Here is my output (Python 2, Spyder):
Try using Jupyter notebook if you have not used it before. In ipython console, it will wrap the text and show it in multiple lines. In kaggle, what you are seeing is itself a jupyter notebook.