Produce HTML output from AWS EMR Jupyter Notebook

Produce HTML output from AWS EMR Jupyter Notebook - python

I'm currently running a default/basic notebook on EMR (Release label: emr-6.1.0, Applications: Spark 3.0.0, Zeppelin 0.9.0, JupyterHub 1.1.0) and i'm having some issues getting the notebook to output a data profiling report in HTML.
I've installed pandas-profiling a variety of ways, using custom bootstrap actions & with the command sc.install_pypi_package("pandas-profiling")
I run into the following issue after trying to generate the report's HTML using IPython. It only produces the object vs the HTML.
I'm also aware that adding %%local can help produce it, like below.
But installing through bootstrap actions/on the notebook does not install from wherever %%local is located. As seen below.
So my first question is, can this profile report be produced without the %%local magic? I know there is also an %%html magic command, but that cannot print out a variable based on my testing, hence why I need IPython.
Second question is, how can pandas-profiling be added to this %%local environment? Should this even be the approach?
Thanks!!

Related

Programmatically Execute Cell Jupyter VSCode

I am looking for a way to programmatically replicate the Run Cell Below functionality VS code.
Previously, I used Jupyter through Conda and used the following code:
import ipywidgets as widgets
from IPython.display import display,Markdown,Javascript,HTML
def run_below(ev):
Javascript('IPython.notebook.execute_cells_below()')
button = widgets.Button(description="Click to run cells below")
button.on_click(run_below)
display(button)
This code worked great, but when I tried to plop it into VSCode, the button just does nothing. I don't understand much about how the VSCode Jupyter backend works, but I'm imagining it has something do do with the IPython.notebook module not working correctly in this IDE (or perhaps the IPython.display.Javascript module?). I really have no real idea though.
Does anyone know how I could do this in VSCode's Jupyter implementation?
I have searched for hours on this topic, but have not been able to find a working solution that works. Please let me know if y'all have any ideas.
Environment Info:
Python Version: 3.9.12
VSCode Version: 1.69.0
Jupyter Extension Version: v2022.6.1001902341

It appears that the ability to access the Kernel in VS code is not possible at this time. See the following GitHub issues to see if this has changed at the time of reading:
Similar question migrated to #6918
Issue #6918 that will resolve problem once closed

Not exactly an answer to detailed question but is an answer to the title of the question "Programmatically execute cell jupyter vscode" since I landed on this page searching for how to do this. The following code supports this task - just make sure to hit save CTRL-S if you make changes to cells that are going to be run by the following function since it reads the current version of file from disk
def execute_cell(filepath,cell_number_range=[0]):
import io
from nbformat import current
with io.open(filepath) as f:
nb = current.read(f, 'json')
ip = get_ipython()
for cell_number in cell_number_range:
cell=nb.worksheets[0].cells[cell_number]
#print (cell)
if cell.cell_type == 'code' : ip.run_cell(cell.input)
also for finding name of current notebook in vscode is easy but not easy in jupyterlab
import os
globals()['__vsc_ipynb_file__'] #full name only in vscode
os.path.basename(globals()['__vsc_ipynb_file__']) #basename only in vscode . globals()['_dh'] #dir in most

Hiding and showing sections in Jupyter notebook

I am using jupyter notebook on my PC. As the length of my codes is increasing, I want to hide and display some sections of the python code based on the heading markings in Jupyter Notebook like the google colab does.
Is there any python package to install in the environment specifically for this functionality? Similar to the screenshots below.
Hidden Codes based on the headings
[1]: https://i.stack.imgur.com/kQSMG.png
Expanded Python Codes
[2]: https://i.stack.imgur.com/20HNw.png

how to use effectively the R keras (and other R packages using python in its background) on google colab's notebook with python kernel?

This is partially related to an already closed question about keras package and R in google colab. But I have some specific doubts in such a workflow...
1 .It is known we have how to use R in google colab. And the use of the google colab's GPU and TPU are indeed interesting.
And although the documentations says we need to run install_keras() in R if we want to use GPU on keras, at google colab it works without this setting. Neither python installation is required.
But deep learning processes are time consuming... running all the code in just one notebook have some limitations... Splitting into several ones, saving and sharing the resultant object to re-use in the next notebook can be interesting.
We can think the above is more desirable specially because the environment is ephemeral. And the solution would be mounting the google drive to be able to use its data, and save some partial outputs on it. But mounting google drive appears to be restricted to python notebooks... yes, we have discussions proposing solutions as here and here but I was not able to implement them...
So I am really curious how the R keras users (and other R users) deal with such an issue when using google colab?
If we keep in this idea of a workflow using more than one notebook, some possible related questions is this one (without answer)
So, I have tried to another alternative: using the python notebook and run R in specific cells inside it using rpy2 like indicated here and other discussions I mentioned before... Yes, one can ask why not coding on Python... Ignoring it and still keeping R...
But happens that R's keras is an api for python keras, and need python to run... But I do not why, when I try to run any keras function, even a simple
%%R
imdb<-dataset_imdb()
I get:
R[write to console]: Error: Error 1 occurred creating conda
environment /root/.local/share/r-miniconda/envs/r-reticulate
I also see one saying Rkerkel does not see the colab's python like here, but I know it is not true, because R'keras works in the Rkernel, and if I run the same py_config, I am able to see the python versions...
But the point is... why in this python notebook, using rpy2, we cannot verify the Python...?
If we run the notebook with R Kernel, the all package requiring python works well without any intervention... that's strange...
I see discussions of how to install conda like here. But I believe this should not be the way... Maybe is related to rpy2...
I have tried some alternatives to check the existence of python versions inside the R cells (called with %%R), and I believe the R called in this sense are not able to see python...
%%R
library(reticulate)
py_config()
It returns the same
R[write to console]: Error: Error 1 occurred creating conda
environment /root/.local/share/r-miniconda/envs/r-reticulate
Error: Error 1 occurred creating conda environment
/root/.local/share/r-miniconda/envs/r-reticulate
So, my major question:
how to use effectively the R keras (and other R packages using python in its background) inside google colab's notebook with python kernel?
What I am missing here with rpy2?

Run Bokeh server on Azure Databricks?

I locally use Bokeh server to visualize data. I tried doing this in Azure's version of Databricks as well, but couldn't get even the first lines of this simple example to run:
from bokeh.io import push_notebook, show, output_notebook
from bokeh.plotting import figure
output_notebook() # <- fails
This fails with the following error:
TypeError: publish_display_data() missing 1 required positional
argument: 'data'
I investigated further and found out that databricks is apparently built open IPython 2.2.0, which is over 4 years old!
import IPython
IPython.__version__ # Returns '2.2.0'
Is there anything I can do? Did anyone have success with running a bokeh server in Databricks? I want to have some kind of interactive Dashboard, and Databricks' own dashboard is extremely limited

As you note, IPython 2.2.0 is ancient. I'm not sure how far back you'd have to go in Bokeh releases to find one that supports it. The function publish_display_data is a Juypter/IPython API, and unfortunately it has seen a few breaking changes over the years. The Bokeh project used to maintain a compatibility polyfill for it to try to smooth over these changes, and support older versions, but it was removed in this commit last year:
https://github.com/bokeh/bokeh/commit/fb3f9cc4f9e9af786698462a9849e46c0ea34cf2
After that commit, 4.3 is the minimum notebook version for any use. Before that commit, some set of earlier Jupyter releases will work, but I can't say exactly how much earlier, and I can't guarantee that an emebedded Bokeh server apps would work (i.e. very possibly only inline standalone plots would work) Embedded Bokeh server apps have never been tested on anything earlier than Jupyter 4.3 and I would never make a claim that Bokeh supports embedded apps in notebook versions older than that.
TLDR; I highly doubt things are workable on IPython 2.2.0

Is there a document map extension for Jupyter notebook?

One of the main advantages of using Jupyter is the ability to code and document in the same time. Many times, I use headings and subheadings to group the code in a notebook. However, as the notebook gets longer, navigation becomes much harder.
It would be nice to have a document map in a separate (left) pane that keeps track of the markdown headings. Once a heading is selected in the document map, the respective section would appear in the main pane.
Is there an extension for this task?

I cannot comment on your question (low points) so if the answer is a bit off...
I've found an extension that looks like what you need
src: https://github.com/ipython-contrib/jupyter_contrib_nbextensions/tree/master/src/jupyter_contrib_nbextensions/nbextensions/toc2
Documentation: http://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/toc2/README.html
Another, not that good:
(imho) https://github.com/captainsafia/notebook-toc
Possible duplicate of: How can I add a table of contents to an ipython notebook?
EDIT
Install instructions:
Toc2 is included in: https://github.com/ipython-contrib/jupyter_contrib_nbextensions
to install just TOC2 i did this:
pip3 install jupyter_contrib_nbextensions
jupyter contrib nbextension install --user
jupyter nbextension enable toc2/main
EDIT 2
How to display the TOC inside a notebook: http://awesomescreenshot.com/0116cqvh5b

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.