Always have Jupyter notebook load with certain options / packages - python

Is there a way to ensure that jupyter notebook always starts with either:
1) Certain packages imported
and/or
2 Certain options set
I don't want to have to type the same things everytime at the the top of each notebook I run - e.g. always using numpy or pandas.
Additionally, I always want to be able to see multiple output per cell. I use the following code to let this work just fine, but I want this saved as some sort of template, that doesn't require manual effort from me to type each time.
Thanks!
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

First, you find where the startup folder is located.
# on Jupyter notebook
import IPython
IPython.paths.get_ipython_dir()
On Windows, the response is u'C:\\Users\\yourname\\.ipython', while on Linux ~/.ipython.
In that location, there are profile folders. At least, there is a profile_default on your computer. One startup folder exists in each profile folder.
You put a python script file in that folder (my case: C:/Users/myname/.ipython/profile_default/startup).
I name my script file 00-first.py, and put this code in it:
import numpy as np
import pandas as pd
When I start the Jupyter notebook server with the default profile, the startup script will be executed prior to opening a Jupyter notebook.
On a newly open Jupyter notebook, you can use numpy and pandas (as np, pd) without importing them first.
print(np.pi) #3.141592...

Related

Change startup configuration for jupyter notebook?

I am trying to automatically load some extensions and their configs to jupyter notebook.
I am trying to automatically load these things every time the jupyter notebook opens:
%load_ext sql
%config SqlMagic.autommit=False
%config SqlMagic.autopandas=True
I have searched many links and tried to change ipython_config.py and other files.
Only one thing has worked for me and it also only can load extension and I am struggling to make it work for config for particular extension.
I have done this:
# file: ~/.ipython/profile_default/startup/startup.py
# usual imports
import numpy as np
# ipython
from IPython import get_ipython
ipython.run_line_magic('load_ext sql')
# Now, how to include %config SqlMagic.autommit=False ?
Question
Note: the np works fine.
How to include config for the given extension?

Launch Jupyter Notebook with launcher and basic code in Linux Mint

I currently open Jupyter Notebook by running jupyter notebook in my terminal.
I have 2 questions that relate to a common goal.
1) How do I create a desktop icon (i.e. launcher) to start Jupyter Notebook on my Linux Mint computer?
2) I want to run some basic code upon double-clicking this launcher:
import numpy as np
import pandas as pd
import seaborn as sb
import sklearn as skl
I don't want to type this code whenever I open Jupyter Notebook, so it would be nice to automatically run it.
For #1, I'm trying to follow the steps here:
https://forums.linuxmint.com/viewtopic.php?t=256156#p1382045
However, I don't know where to find the path for the "Command" field. I tried browsing the Anaconda folder on my computer, but I can't find Jupyter Notebook there.
For your first question, if you open up a terminal, you can find where programs are with which, like so
which jupyter
This should output where your particular jupyter is being called from.
For your second question, it looks like you can create a profile to start up certain functions.
In bash, you can run the following
# Create a new folder if it already doesn't exist
mkdir -p ~/.ipython/profile_default/startup
# Create Python file to put your favorite imports
touch ~/.ipython/profile_default/startup/start.py
So the start.py (that is located in ~/.ipython/profile_default/startup/) is where you can put your imports. So this start.py file should contain
import numpy as np
import pandas as pd
import seaborn as sb
import sklearn as skl
Resources:
https://towardsdatascience.com/how-to-automatically-import-your-favorite-libraries-into-ipython-or-a-jupyter-notebook-9c69d89aa343
https://stackoverflow.com/a/11124846/
https://ipython.readthedocs.io/en/stable/interactive/tutorial.html#startup-files
In sum, if you implement both of the suggestions above, you can get both an icon to auto-load your favorite Python library packages as you start a Jupyter Notebook.
Warning: typically for reproducibility and transferability of Jupyter Notebooks to others, I would go against auto-loading libraries into your Jupyter Notebook. I understand the repetition of having to load the same libraries you use all the time, but if for whatever reason, you change computers or a colleague needs your code, then your notebook will not work correctly. Just my two cents to keep in mind if/when implementing this.

Is there a way to run a default code in Jupyter Notebook without using %load or setting a profile? [duplicate]

Suppose I have a code snippet that I'd like to run every time I open a jupyter notebook (in my case it's opening up a Spark connection). Let's say I save that code in a .py script:
-- startup.py --
sc = "This is a spark connection"
I want to be able to have that code snippet run every time I open a kernel. I've found some stuff about the Jupyter Configuration File, but it doesn't seem like variables defined there show up when I try to run
print(sc)
in a notebook. Is there a command-line option that I could use -- something like:
jupyter notebook --startup-script startup.py
or do I have to include something like
from startup import sc, sqlContext
in all of the notebooks where I want those variables to be defined?
I'd recommend to create a startup file as you suggested, and include it via
%load ~/.jupyter/startup.py
This will paste the content of the file into the cell, which you can then execute.
Alternatively, you can write a minimal, installable package that contains all your startup code.
Pro: Doesn't clutter your notebook
Con: More difficult to make small changes.
A custom package or explicit loading is not needed (though might be preferred if you work with others): you can have auto-executed startup scripts
👉 https://stackoverflow.com/a/47051758/2611913

Jupyter Notebook: How to re-run notebook with different parameters (e.g. input data file)?

I have a notebook, which can do some complex data analysis with one dataset. At the end, it saves the notebook and converts into HTML, so I can view the result later without running the notebook again. One example is this:
https://cdn.rawgit.com/cqcn1991/Wind-Speed-Analysis/master/output_HTML/marham.html
Now, I want to run the notebook over many, different datasets. How can I do it?
Maybe something like
files = [
'./data/NCDC/cn/binhai/dat.txt',
'./data/NCDC/cn/luogang/dat.txt',
"./data/NCDC/cn/tianjing/dat.txt",
"./data/NCDC/cn/gushi/dat.txt",
"./data/NCDC/cn/yueyang/dat.txt",
]
for input_file_path in files:
run_notebook('GMM.ipynb', input_file_path)
My thougths:
I find Run parts of a ipython notebook in a loop / with different input parameter, but it's run part of the cells within the notebook.
Scientific Computing & Ipython Notebook: How to organize code? provides a solution to run notebook within a host notebook. However, it's not that the target notebook gets runned, but the target notebook's code run within the host's environment. That leaves the original notebook un-changed, and when saving into HTML, the results are all the same.

Accessing Root in Jupyter Notebook

I've started a notebook from the D:/ drive, but navigated a few directories down. My notebook is at D:/dir1/dir2/notebook.ipynb. In my current notebook, I want to execute a script in the root of D:/, where my notebook session was started from.
I want to avoid relative path changes, and was hoping there's a way to access the directory location of where I had started the notebook (the location corresponding to localhost:xxxx/tree. Is that possible?
This is a slightly hacky way, but works:
import jupyter_core
import os,glob,json
jrd = jupyter_core.paths.jupyter_runtime_dir()
with open(glob.glob(jrd+'/nbserver-*.json')[0]) as json_file:
root_dir = json.load(json_file)['notebook_dir']
The reason for the globbing is because the json file you are looking for has a number corresponding to the process id (PID) in its name.
Therefore this method will be guaranteed to work if you have only one notebook instance. If you know the PID you don't need to use glob

Categories