Databricks how to get output of the Notebook Jobs via API? - python

My Python Notebooks log some data into stdout and when I run this notebooks via UI I can see outputs inside cell.
Now I am running Python Notebooks on Databricks via API (/2.1/jobs/create and then /2.1/jobs/run-now) and would like to get output. I tried both /2.1/jobs/runs/get and /2.1/jobs/runs/get-output however none of the includes stdout of the Notebook.
Is there any way to access stdour of the Notebook via API?
P.S. I am aware of dbutils.notebook.exit() and will use it, if it will not be possible to get stdout.

Yes, it is impossible to get the default return from python code. I saw that on a different cloud provider you can get an output from logs also.
100% solution it is using dbutils.notebook.exit()

Related

What's the best way of sharing jupyter notebook script with others to be able to run it on their windows computer?

I just wrote a script in jupyer notebook and I'm wondering what's the best format I should save this file as if I want to share it with other people so they can run it on their windows computer? I tried to convert my .ipynb to both .py and .exe, but none seem to work... maybe I'm doing something wrong.
Converting a Jupyter Notebook to a .py or .exe file would make no sense, as notebooks are used for the purpose to execute certain blocks of code on click. You could convert the notebook and it's output to a .pdf File, as you can read here: How to convert IPython notebooks to PDF and HTML?
If you want to port your Jupyter Notebook to a Python file, you'd have to make sure that you include all the code in the .py file, which is written down in the notebook as well. Keep in mind that, when using a regular Python File, things obviously won't look as great as in Jupyter.
The best way to share your Notebooks, would be to send them the actual file, so they can open the notebook in Jupyter notebook, or - in case they don't want to install Jupyter or Python on their device - they could use an online version of Jupyter Notebook like: https://nbviewer.jupyter.org/ - There are multiple websites available, which offer that kind of service.
You can try to use Mercury framework for converting Jupyter Notebook into web application. The Mercury can generate the widgets for your notebook, so don't need to use GUI packages (like tkinter). The widgets are generated by YAML header. Widgets are connected with variables in the notebook code. Your users can change the widgets values and execute the notebook. The final result can be easily exported to PDF. You can read more in the tutorial on how to share Jupyter Notebook with non-technical users.
The example notebook with YAML header
The example notebook converted to web app
Please notice that notebook's code is hidden (show-code: False in the YAML).
The Mercury can be easily deployed to Heroku or any cloud provider (Digital Ocean, GCP, Azure, AWS), please check the docs for details. You just send the server URL to share the app.

Python Jupyter Notebook expose local data in the web?

I'm a total noob at Python, I just downloaded Anaconda and started to use Jupyter Notebook.
I was wondering: since Jupyter Notebook looks web based, should I have any privacy concerns using it? i.e. are the data on my pc exposed out in the web?
You probably shouldn't worry about running notebooks on your localhost. If you want more info (e.g. if you ever intend to run your notebook at a remote server), this link will give you some insight regarding security concerns.
TL;DR: no, your data are not exposed.

How to allow users to run jupyter notebooks stored in my database on my server?

I have a problem and I need a hint how to approach the problem.
I have django application, in which I have sme jupyter notebooks stored in my database. At this point, users can download notebooks and run them on their compuers.
I would like to add functionality, where user could run notebook online. I was thinking of two solutions:
first one is to use some free to use online service, like google colab, but I haven't found any with api where I could send file from my database (maybe you know about some?),
second is to run jupyter hub on my server. I saw how to run jupyter hub remotely, but I don't know how to grant users the access, so they can run notebooks simultaneously, and they don't have access to server itself thorugh it, and do all of this in django.
Do you have any hints that could help me get this functionality?
JupyterHub is a good approach if you trust your users. However, if you want to run untrusted code (like Google Colab does), you need sandboxing. In that case, you can use a Docker image to run notebooks. For example, mikebirdgeneau/jupyterlab. And there is a docker-compose file example: https://github.com/mikebirdgeneau/jupyterlab-docker/blob/master/docker-compose.yml

Jupyter notebook execution based on user entered automation parameters

I am trying to build a service that would allow users using notebook to set automation parameters in a cell like the starting time as to when the notebook should start executing. The service would then take this input time and execute the notebook at the desired time and store the executed notebook to S3. I have looked into papermill but I believe there is no way to add automation parameters like start execution time using that. Is there any ways to achieve this? Or is there a way papermill can achieve this?
Papermill handles just parameterizing and executing the notebooks, not scheduling. For that, you need to use another tool. You can build something yourself on top of Apache Airflow which seems to be the most widespread scheduler for such case. It has a native support for Papermill (see here). Or you can use a ready tool like Paperboy.
To read in-depth about scheduling notebooks, take a look at the article by Netflix.
Take a look at the code here and here for a wrapper that will schedule notebook execution
The shell scripts above create a VM, runs the notebook, saves the output and destroy the instance.
In Google Cloud AI Platform Notebooks we provide a scheduling service which is in Beta now.

Use Jupyter notebook without DataBricks in Azure Data Factory?

I gather from the documentation that we can use Jupyter notebooks only with Databricks Spark cluster.
Is there a way around this? Can I call Jupyter notebook as an activity from ADF without Databricks environment? I would like to have a simple way to call some python code from ADF.
Thanks!
You can try Custom Activity in ADF. Custom activity supports cmd command, so you can use command line to invoke python script.
And there's another example of using python in custom activity:
https://github.com/rawatsudhir1/ADFPythonCustomActivity
Hope it helps.

Categories