How to import one databricks notebook into another? - python

I have a python notebook A in Azure Databricks having import statement as below:
import xyz, datetime, ...
I have another notebook xyz being imported in notebook A as shown in above code.
When I run notebook A, it throws the following error:
ImportError: No module named xyz
Both notebooks are in the same workspace directory. Can anyone help in resolving this?

The only way to import notebooks is by using the run command:
%run /Shared/MyNotebook
or relative path:
%run ./MyNotebook
More details: https://docs.azuredatabricks.net/user-guide/notebooks/notebook-workflows.html

To get the result back as a DataFrame from different notebook in Databricks we can do as below.
noebook1
def func1(arg):
df=df.transfomationlogic
return df
notbook2
%run path-of-notebook1
df=func1(**dfinput**)
Here the dfinput is a df you are passing and you will get the
transformed df back from func1.

Related

ImportError Only in Jupyter Notebook

When I try importing functions and classes that I've created in Python scripts into a Jupyter Notebook, I get import errors. However, when I run the same code in a regular script rather than in a notebook, it runs without a problem.
All three files are in the same directory:
First, I have my_function_script.py, which defines functions.
def my_function():
pass
Second, I have my_class_script, which both imports the functions defines classes:
from my_function_script import my_function
class my_class():
pass
When I try to run the below import script in a Jupyter Notebook, I get an ImportError.
from my_class_script import my_class
ImportError Traceback (most recent call last)
<ipython-input-6-8f2c4c886b44> in <module>
----> 1 from my_class_script import my_class
~\my_directory\my_class_script.py in <module>
5
----> 6 from my_function_script import my_function
ImportError: cannot import name 'my_function' from 'my_function_script' (C:\Users\my_directory\my_function_script.py)
I believe that the problem is specific to the Jupyter Notebook for two reasons. First, I've confirmed that both my_function_script.py and my_class_script.py can run in the terminal without error. Second, when I take the same line that causes the Jupyter Notebook to error and run it in a regular Python script, it runs without error.
I have Windows, and I don't have multiple environments or versions of Python.
you can add to the Python path at runtime:
# some_file.py
import sys
# insert at 1, 0 is the script path (or '' in REPL)
sys.path.insert(1, '/path/to/application/app/folder')
import file
this usually happens when your jupyter notebook does not point to the correct directory for importing
one way to fix this would be to change the working directory in jupyter notebook
import os
os.chdir('path/to/python/scripts/directory')
from my_class_script import my_class
another way to do this is using UNIX commands directly in jupyter notebook
cd /path/to/python_script_directory
from my_class_script import my_class
make sure you dont put the cd command and import statement in the same notebook block

Execute python script with spark

I want to pass a python test into SparkContext within my jupyter notebook and have the output shown in the notebook as well. To test, I'm simply executing my jupyter notebook like so:
sparkConf = SparkConf()
sc = SparkContext(conf=sparkConf)
sc.addPyFile('test.py')
With test.py looking like
rdd = sc.parallelize(range(100000000))
print(rdd.sum())
But when I execute the sc.addPyFile line in my notebook, I do not see the output. Am I passing the pyspark script into my SparkContext incorrectly?
The function you are using is not used to trigger the job instead it pass the python module to the sparkContext so that it can be imported in the script as needed.
See here:
https://spark.apache.org/docs/0.7.3/api/pyspark/pyspark.context.SparkContext-class.html#addPyFile
To trigger a job you need to run
spark-submit test.py outside of your jupyter notebook.

Trying to load Python dataframe into Hadoop (Impala) using `ibis`, getting "AttributeError: module 'ibis' has no attribute 'impala' "

I'm running the following block of Python commands in a Jupyter notebook to upload my dataframe, labeled df, to Impala:
import hdfs
from hdfs.ext.kerberos import KerberosClient
import pandas as pd
import ibis
hdfs = KerberosClient('< URL address >')
client = ibis.impala.connect(host="impala.sys.cigna.com", port=25003, timeout=3600, auth_mechanism="GSSAPI", hdfs_client=hdfs)
db=client.database("< database >")
db.create_table("pythonIBISTest", df)
. . . but am getting the error message AttributeError: module 'ibis' has no attribute 'impala'.
Note: I've already installed the hdfs, ibis, ibis-framework[Kerberos], and impyla modules in the Jupyter terminal.
What am I doing wrong?
You may need to pip install ibis-framework[impala] to get the impala part of ibis

importing functions from another jupyter notebook

I am trying to import a function from another jupyter notebook
In n1.ipynb:
def test_func(x):
return x + 1
-> run this
In n2.ipynb:
%%capture
%%run n1.ipynb
test_func(2)
Error:
NameError Traceback (most recent call last)<ipython-input-2-4255cde9aae3> in <module>()
----> 1 test_func(1)
NameError: name 'test_func' is not defined
Any easy ways to do this please?
The nbimporter module helps us here:
pip install nbimporter
For example, with two notebooks in this directory structure:
/src/configuration_nb.ipynb
analysis.ipynb
/src/configuration_nb.ipynb:
class Configuration_nb():
def __init__(self):
print('hello from configuration notebook')
analysis.ipynb:
import nbimporter
from src import configuration_nb
new = configuration_nb.Configuration_nb()
output:
Importing Jupyter notebook from ......\src\configuration_nb.ipynb
hello from configuration notebook
We can also import and use modules from python files.
/src/configuration.py
class Configuration():
def __init__(self):
print('hello from configuration.py')
analysis.ipynb:
import nbimporter
from src import configuration
new = configuration.Configuration()
output:
hello from configuration.py
Something I've done to import functions into a Jupyter notebook has been to write the functions in a separate Python .py file then use the magic command %run in the notebook. Here's an example of at least one way to do this:
Both notebook.ipynb and helper_functions.py are in the same directory.
helper_functions.py:
def hello_world():
print('Hello world!')
notebook.ipynb:
%run -i helper_functions.py
hello_world()
notebook.ipynb output:
Hello world!
The %run command tells the notebook to run the specified file and the -i option runs that file in the IPython namespace, which is not really meaningful in this simple example but is useful if your functions interact with variables in the notebook. Check out the docs if I'm not providing enough detail for you.
For what it's worth, I also tried to run function definitions in an outside .ipynb file rather than a outside .py file and it worked for me. Might be worth exploring if you want to keep everything in notebooks.
Based on answer of Kurt:
%run -i configuration.ipynb
This runs another notebook and in the next cell you are able to access the variables defined by that notebook.
This works for me
from some_dir.pythonFile import functionName
%run ./some_dir/pythonFile.py
This works as well:
%load_ext autoreload
%autoreload 2
from some_dir.pythonFile import functionName

How to run an existing function from Jupyter notebook

I am using Jupyter notebook. In the same folder in which the notebook is running, I have a function f defined as
def f(x):
return x**2
I have saved this function as f.py in the same folder. Now I want to call this function in the notebook that is running. How do I do that? If the function was typed into the notebook, I could have just typed
f(4)
Try the load magic;
%load f.py
That automatically loads the in the entire contents of file so that you can edit it in a cell.
from f import f
Is another option.
If neither one of those work for you could try adding your notebook's directory to the the system path by running this block as a cell before trying to call your function;
import os
import sys
nb_dir = os.path.split(os.getcwd())[0]
if nb_dir not in sys.path:
sys.path.append(nb_dir)
%run f.py
load magic was just copying the whole file into a cell, which was not what i need. Neither did the importing worked for me. was throwing some weird errors. So i ended up using the run magic.

Categories