How to test Jupyter notebooks on Travis CI? - python

Is there a way to deploy Jupyter Notebooks on Travis CI and test running all the cells?
My Jupyter Notebooks use IPython Kernel, and I have a conda environment file.

I've been wondering something similar and have compiled some information but haven't fully tested it yet.
Firstly, you can rely on jupyter nbconvert execute notebooks, where you can then look for errors. There's an example set up with Travis CI and Conda at ghego/travis_anaconda_jupyter. I believe Travis CI relies on pytest too to catch issues, though I'm not entirely sure how this fits together.
Another way you can run this is with pytest-notebook, which relies on you having a working version of the notebooks you want in some environment. This package's main purpose is to detect if changes to the environment will create issues within the notebooks. This can also potentially be used in conjunction with the above method, though it might be redundant.
It might be additionally beneficial for version management (tracking, seeing diffs, legibility) to write your notebooks in markdown format and then use jupytext to convert them into a .ipynb file to then run with the above options. jupytext can also execute notebooks directly with the --execute flag, so perhaps there's an even simpler way to integrate such a workflow!
I will be testing this in the coming weeks and will update this comment if I learn anything new.

Related

Kedro using wrong conda environment

I have created a conda environment called Foo. After activating this environment I installed Kedro with pip, since conda was giving me a conflict. Even though I'm inside the Foo environment, when I run:
kedro jupyter lab
It picks up the modules from my base environment, not the Foo environment. Any idea, why this is happening, and how I can change what modules my notebook detect?
Edit
By mangling with my code I found out that on the \AppData\Roaming\jupyter\kernels\kedro_project\kernel.json it was calling the python from the base environment, not the Foo environment. I changed it manually, but is there a mode automatic way of setting the \AppData\Roaming\jupyter\kernels\kedro_project\kernel.json to use the current environment I'm on?
The custom Kedro kernel spec is a feature that I recently added to Kedro. When you run kedro jupyter lab/notebook it should automatically pick up on the conda environment without you needing to manually edit the kernel.json file. I tested this myself to check that it worked so I'm very interested in understanding what's going on here!
The function _create_kernel is what makes the the Kedro kernel spec. The docstring for that explains what's going on, but in short we delegate to ipykernel.kernelspec.install. This generates a kernelspec that points towards the Python path given by sys.executable (see make_ipkernel_cmd). In theory this should already point towards the correct Python path, which takes account of the conda environment.
It's worth checking which kedro to see which conda environment that points to, and if we need to debug further then please do raise an issue on our Github repo. I'd definitely like to get to the bottom of this and understand where the problem is.
P.S. you can also do a plain jupyter lab/notebook to launch a kernel with the right conda environment and then run %load_ext kedro.extras.extensions.ipython in the first cell. This is basically equivalent to using the Kedro kernelspec, which loads the Kedro IPython extension automatically.
This is likely a problem with jupyter. I'd suggest trying to run jupyter notebook and understand if it is down to kedro or jupyter.
I remember facing something similar due to some jupyter problem but don't remember how I fixed it. I remember trying some solutions from
this issue on jupyter.
Try do pip install jupyterlab in your foo enviornment, Jupyter Kernel is a different concept and acts weird sometimes.

JupyterLab reopens the most recently closed set of notebooks automatically, regardless of which Conda environment I start it in

I have separate Conda environments for work and outside projects. I start JupyterLab with a simple jupyter lab command within an Anaconda command prompt (on Windows, but my issue occurs on MacOS and Linux too). I conda activate the relevant environment before invoking the startup command. However, JupyterLab always auto-reopens the most recent set of notebooks I had open, without regard to which Conda environment I had been working in before vs. now. It also defaults to the associated working directory (I know that I can at least explicitly specify a working directory on startup, though). Is there a way for me to get JupyterLab to associate a certain set of notebooks with a specific Conda environment?
Note that this question is separate from issues concerning kernels. I use nb_conda_kernels to manage kernels for each notebook, and I have a handle on how that works. I'm just wondering about this auto-reopen feature. Does anyone know about how it works (i.e., where it draws the last-open files from)?
As a slight side note, it seems to often act like all of the newly-reopened files are unsaved, even after I save them all and do a simple restart of JupyterLab. I'm curious about why that happens too, but it's less important.

Getting an R Notebook to work in Jetbrains Dataspell

So I am just starting a data science/stats class and I am trying to setup a R notebook within Dataspell I am able to create a Jupyter notbook but it only wants a python interpreter and I can't seem to change the interpreter to R
I only allows me to set a python interpreter. I am able to run R files just fine but I am trying to do it in a notebook. (Whether that be Jupyter or some other notebook I couldn't care less)
I would like to stick to Jetbrains IDE's either Dataspell or Pycharm. I tried our Datalore and got an R notebook working but its really slow for me.
Actually, there's an (currently - as of May 2022) undocumented feature in Dataspell 2022.1 which they have been developing which supports R kernels in Dataspell as an IDE.
If you have an Conda environment which has R and the irkernel installed and configured (for instance, if you set up your Anaconda environment to run Jupyter notebooks with the R kernel), you can open existing R notebooks in Dataspell and run them just like Python notebooks. The only thing you can't do is create new ones, sadly.
Basically, just use Anaconda or Miniconda to create an environment which can run Jupyter notebooks with the irkernel, create your notebooks in Jupyter, then point your Dataspell directory at it and it should work.
They do mention that it's a developing feature - so if you encounter bugs, you can flag them in the Dataspell issue tracker. I tried it and it worked pretty well so far.
You can see an example of my R kernel running in Dataspell here:
DataSpell can be a little difficult to navigate. In any case, File>New... pops up a menu:
Pick "RMarkdown File", which your purposes will work the same as a notebook. RStudio has both markdown and notebook options, but they still have an Rmd extension.
You should also see a R Console button at the bottom of the screen.

Why wouldn't Python code run in a Jupyter notebook when it runs in the terminal?

I've replicated Slack's API demo in a text editor and the terminal, but just for fun, I wanted to do it again using Jupyter notebook. I used the same virtual environment so as to rule-out dependency issues and the code was the same (cloned from the master repo). But for some reason, I'm always getting ModuleNotFound errors when I try to run the code in Jupyter cells and import the necessary packages.
I even re-built a fresh virtual environment, and ran the demo at a terminal from within Jupyter, and running the scripts works fine. I just doesn't execute in the notebook environment.
I'm using the same kernel across my Python interpreters, and I always start my virtualenv session before launching the notebook.
Anyone have any ideas why this would be the case?
Well, DUH. It was just a matter of uninstalling/reinstalling the package in a fresh virtual environment (and I'm pretty sure the latter wasn't absolutely necessary).
Thanks to this comment on Github for the guidance.

Anaconda uses Jupyter notebooks so why option to install VS Code?

I installed Anaconda as it's a recommended way to start with Jupyter notebooks.
I was surprised at the end of the Anaconda windows install to be invited to install Microsoft VS Code as a code editor. Reading about VS Code it seems a well respected editor but does that not take away the idea of using using Jupyter notebooks? Or am I missing something?
Anaconda also installs IDLE and Spyder, which are IDEs (Integrated Development Environments). Anaconda simply gives you a choice. Each of those choices has its advantages and disadvantages. Using one does not prohibit you from also using another.
Jupyter might recommend using Anaconda, but this doesn't imply that Anaconda would recommend using Jupyter.
You are right that if you are going to focus on Jupyter notebooks you won't need to use VS Code.
But most people that use Anaconda are not using Jupyter notebooks - they write python scripts not notebooks - and for that vscode is a respected choice.

Categories