How to edit and debug a library in python?

How to edit and debug a library in python? - python

I have created a my own library(package) and installed as development using pip install -e
Now, I would like to edit this library(.py) files and see the update in jupyter notebook. Every time, I edit a library(.py) files I am closing and reopening ipython notebook to see the update. Is there any easy way to edit and debug .py package files ?

Put this as first cell of your notebooks:
%load_ext autoreload
%autoreload 2
More info in the doc.

When you load jupyter, you are initializing a python kernel. This will lock your python to the environment it was at when you loaded the kernel.
In this case, your kernel contains your local egg installed package at the point where it was when you loaded jupyter. Unfortunately, you will need to reload jupyter every time you update your local package.
#BlackBear has a great solution of using autoreload in your first cell:
%load_ext autoreload
%autoreload 2
A follow up solution assumes you do not need to make changes to your notebooks, but just want the updated outputs given changes to your package. One way I have gotten around this is to use automated notebook generation processes using jupyter nbconvert and shell scripting. You essentially create some jupyter templates stored in a templates folder that you will auto execute every time you update your package.
An example script:
rm ./templates/*.nbconvert.ipynb
rm ./*.nbconvert.ipynb
for file in "templates"/*.ipynb
do
echo $file
jupyter nbconvert --to notebook --execute $file
done
mv ./templates/*.nbconvert.ipynb .
Assuming you want to actively debug your package, I would recommend writing test scripts that load a fresh kernel every time. EG:
#mytest.py
from mypackage import myfunction
expected_outputs={'some':'expected','outputs':'here'}
if myfunction(inputs)==expected_outputs:
print('Success')
else:
print('Fail')
python3 mytest.py

Related

How can I develop with Python libraries in editable mode on databricks?

On Databricks, it is possible to install Python packages directly from a git repo, or from the dbfs:
%pip install git+https://github/myrepo
%pip install /dbfs/my-library-0.0.0-py3-none-any.whl
Is there a way to enable a live package development mode, similar to the usage of pip install -e, such that the databricks notebook references the library files as is, and it's possible to update the library files on the go?
E.g. something like
%pip install /dbfs/my-library/ -e
combined with a way to keep my-library up-to-date?
Thanks!

I would recommend to adopt the Databricks Repos functionality that allows to import Python code into a notebook as a normal package, including the automatic reload of the code when Python package code changes.
You need to add the following two lines to your notebook that uses the Python package that you're developing:
%load_ext autoreload
%autoreload 2
Your library is recognized as the Databricks Repos main folders are automatically added to sys.path. If your library is in a Repo subfolder, you can add it via:
import os, sys
sys.path.append(os.path.abspath('/Workspace/Repos/<username>/path/to/your/library'))
This works for the notebook node, however not for worker nodes.
P.S. You can see examples in this Databricks cookbook and in this repository.

You can do %pip install -e in notebook scope. But you will need to do that every time reattach. The code changes does not seem to reload with auto reload since editable mode does not append to syspath; rather a symblink on site-packages.
However editable mode in cluster scope does not seem to work for me

I did some more test and here are my findings for pip install editable:
(1) If I am currently working on /Workspace/xxx/Repo1, and %pip install -e /Workspace/xxx/Repo2 at Notebook scope, it only get recognized in driver node but not worker nodes when you run RDD. When I did "%pip install -e /Workspace/xxx/Repo2" as notebook scope, the class function in Repo2 I called from Repo1 is fine if such function is used only in driver node. But it failed in worker node as worker node does not append the sys.path with /Workspace/xxx/Repo2. Apparently worker node path is out of sync from driver node after %pip install editable mode.
(2) Manually append sys.path of /Workspace/xxx/Repo2 when working on a notebook at /Workspace/xxx/Repo1: this also works only in driver node but not worker node. To make it work in worker node, you need to append the same sys.path in each worker node job function submission, which is not ideal.
(3) install editable /Workspace/xxx/Repo2 at init-script: this works in both driver node and worker node as this environment path is initialized at cluster init stage. This is the best option in my opinion as it ensure consistency across all notebooks. The only downside is /Workspace is not mounted at cluster init stage so /Workspace is not accessible. I can only make it work for when pip install -e /dbfs/xxx/Repo2

pip3 install of Jupyter and Notebook problem when running

I have tried all of the things here on stack and on other sites with no joy...
I'd appreciate any suggestions please.
I have installed Jupyter and Notebook using pip3 - please note that I have updated pip3 before doing so.
However when trying to check the version of both jupyter --version and notebook --version my terminal is returning no command found. I have also tried to run jupyter, notebook and jupyter notebook and I am still getting the same message.
I have spent nearly two days now trying to sort this out... I'm on the verge of giving up.
I have a feeling it has something to do with my PATH variable maybe not pointing to where the jupyter executable is stored but I don't know how to find out where notebook and jupyter are stored on my system.
many thanks in advance
Bobby

You should be able to run jupyter with python -m even if the PATH variable is not set up correctly.
python -m jupyter notebook
you can check the PATH variables on Windows if you search in with the windows search function for env and then click on Edit the system environment variables > Environment Variables....
The path variable is a list of paths that the terminal checks for commands.
I didn`t work on Mac for a long time, so not sure how similar linux and mac command line still are, but on debian you control your path variable like this.
View paths:
echo $PATH
/usr/local/bin:/usr/bin:/bin
Add a path:
export PATH=$PATH:/mynewpath
For constant export add to ~/.bashrc
To view the path of the pip package, you can use
pip3 show jupyter
When jupyter-notebook works and jupyter notebook does not. It looks to me like a symlink thing. Or a Mac-specific problem.

So to summarise this is what I have found on this issue (in my experience):
to run the jupyter app you can use the jupyter-notebook command and this works, but why? This is because, the jupyter-notebook is stored in usr/local/bin which is normally always stored in the PATH variable.
I then discovered that the jupyter notebook or jupyter --version command will now work if I did the following:
open my ./bash_profile file
add the following to the bottom of the file: export PATH=$PATH:/Users/your-home-directory/Library/Python/3.7/bin
this should add the location of where jupyter is located to your path variable.
Alternatively, as suggested by #HackLab we can also do the following:
python3 -m jupyter notebook
Hopefully, this will give anyone else having the same issues I had an easier time resolving this issue.

I've found a solution from the documentation over at Jupyter https://jupyter-notebook.readthedocs.io/en/stable/troubleshooting.html but I am still curious.
It states that to run the application to use the command jupyter-notebook and hey-presto! It does seem to work now. But, why is this when nearly everywhere else I have read that to run the app we just type the command jupyter notebook.
Also, if I do need to check the version of any of the Jupyter files how do I go about this now, if jupyter --version and notebook --version still don't work.
Also, how do I go about finding these files in my file system if I have no idea where they are located? And how do I go about adding these to my path so that I can, for example, check the version of these programs?
Will pip3 automatically update this software as and when needed?
Thanks again in advance

have you tried locate Jupiter? It may tell you where jupyter is on your system.
Also, why not try installing jupyter via anaconda to avoid the hassle?

I definitely would recommend going through anaconda which makes everything a lot easier.
The following is the link with step by step instructions: https://jupyter.readthedocs.io/en/latest/install.html

How to interpret .py files as jupyter notebooks

I am using an online jupyter notebook that is somehow configured to read all .py files as jupyter notebook files:
I am a big fan of this setup and would like to use it everywhere. On my own jupyter installation however, .py files are just interpreted as test files and are not by default loaded into jupyter cells. How can I achieve the same configuration for my jupyter notebook?

What you're looking for is jupytext.
You just need to install it into python env from which you're running your jupyter notebooks:
pip install jupytext --upgrade
And you get this:

That's not exactly what you asked, but you can achieve something close to that by using the magic %load FILE.py in a new jupyter notebook.
%load FILE.py will copy the contents of FILE.py in the current cell upon executing it.

You use the python code in your Jupyter Notebook by just pasting the whole code in a cell OR :
%load pythonfile.py to load code from a file (not necessarily .py files) into a jupyter notebook cell;
%run pythonfile.py in order to execute the file instead of loading it (outputs whatever that file outputs).
Also, pythonfile.py should exist in the cd or you can use its full path.

Type-check Jupyter Notebooks with mypy

I have a project containing a bunch of Python modules (.py files) and a bunch of Jupyter Notebooks (.ipynb files) which import things from the Python modules.
I can (assuming I've got __init__.py files in all subfolders) type-check all the .py files by simply running mypy . from the root of the project. But I'd like to also be able to type-check my Jupyter Notebooks.
An ideal solution would:
type check all Python code in my Jupyter Notebooks,
be able to follow imports of .py modules from within Jupyter Notebooks when type-checking, just like imports in .py files,
let me type-check the whole project from the command line, so that I can run type-checking as part of a test suite or a pre-commit hook, and
in some way meaningfully report the locations of type errors within my Notebooks, analogously to how mypy prints line numbers for errors in .py files.
How can I do this?

You could use nbQA and do
pip install -U nbqa
nbqa mypy your_notebook.ipynb

You can:
Convert all notebooks to python, then run mypy on that (How do I convert a IPython Notebook into a Python file via commandline?).
jupyter nbconvert --to script [YOUR_NOTEBOOK].ipynb
Just write a small script to do this and you are fine :)

Checkout nb-mypy
Nb Mypy is a facility to automatically run mypy on Jupyter notebook cells as they are executed, whilst retaining information about the execution history.
More details here

I use Jupytext and my IDE for this.
I export a copy in py:percent format, link that to the notebook. I Do the development in the jupyter lab environment, but the .py file is the one that goes in the git repository. Before commiting, I run it throught the usual linters, black, pydocstyle, mypy (with a strict configuration). I then reload the notebook in Jupyter lab, restart the kernel and 'Run All' to make sure the results are still OK, and only then commit the file to the repository

Script to run a jupyter notebook automatically

I have a jupyter notebook with both html and python code in it. Is it possible to write a script that will launch the notebook and run it in the browser? Most solutions on the web refer to running these scripts from the command line, but I want them to show up on the

I'm not sure what OS you're on, but here's a small batch file that moves into my directory with my .ipynb files, starts Jupyter, and then opens a specific notebook of mine:
cd "%userprofile%\desktop\att"
start chrome.exe http://localhost:8888/notebooks/ATT_SQL.ipynb#
jupyter notebook
I just put this on my desktop and double click it to start Jupyter. Just replace your notebook's name where mine is ATT_SQL.ipynb#. You'll also have to change the cd command as well.
Edit:
Or better yet:
cd %userprofile%\path\to\your\jupyter\dir
jupyter notebook yourNotebook.ipynb
Source

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to edit and debug a library in python? - python

Put this as first cell of your notebooks: %load_ext autoreload %autoreload 2 More info in the doc.

Related

How can I develop with Python libraries in editable mode on databricks?

pip3 install of Jupyter and Notebook problem when running

How to interpret .py files as jupyter notebooks

Type-check Jupyter Notebooks with mypy

Script to run a jupyter notebook automatically

Categories

Resources