Python Modules & Packages using jupyter notebook in Google Cloud Platform - python

I'm using Jupyter Notebooks within Google CloudDatalab in Google Cloud Platform for my Python scripts. This creates .ipynb files.
I want to be able to create Modules and Packages using Jupyter Notebooks but it is not able to import the scripts.
For e.g.:
mymodule.ipynb has this:
def test_function():
print("Hello World")
Then in myscript.ipynb when I try to import the above:
from mymodule import test_function
It throws an error :
*ImportErrorTraceback (most recent call last) in ()
----> 1 from mymodule import test_function ImportError: No module named mymodule*
How do I create Modules & Packages using Jupyter Notebooks?

Notebooks can't be used as modules. You need to create a python file (e.g. mymodule.py).
If you want you can do this from within a jupyter notebook:
with open('mymodule.py', 'w') as f:
f.write('def test_function():')
f.write(' print("Hello World")')
from mymodule import test_function
test_function()
# Hello World

You cannot import Jupyter Notebook in the same way as Python files (packages).
Check this link: https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Importing%20Notebooks.html

I've received a response on a different forum which answered my question. Here is the answer:
Put your module code into a .py file and make sure that's in the same directory as myscript.ipynb.
To create the .py file from the code currently in a notebook, you can download it from within Jupyter as a .py file and tidy it up in your text editor of choice.
Remember, the import statement is pure python syntax. It knows nothing about - and has no wish to know anything about - jupyter notebooks. It's looking for a .py file.
This has resolved the issue.
Thanks a lot.

Related

How to import a module into another module in databricks notebook?

This is my config.py in Databricks
DATA_S3_LOCATION='s3://server-data/data1'
DATA_S3_FILE_TYPE='orc'
DATA2_S3_LOCATION='s3://server-data/data2'
DATA2_S3_FILE_TYPE='orc'
I have init . py in this folder as well
I am trying to access these variables in another file
import sys
sys.path.insert(1,'/Users/file')
from file import config
I am facing error , no module named file
There are several aspects here.
If these files are notebooks, then you need to use %run ./config to include notebook from the current directory (doc)
if you're using Databricks Repos and arbitrary files support is enabled, then your code needs to be a Python file, not notebook, and have correct directory layout with __init__.py, etc. In this case, you can use Python imports. Your repository directory will be automatically added into a sys.path and you don't need to modify it.
P.S. I have an example of repository with both notebooks & Python files approaches.

Including common/utility Python code into Jupyter Notebook on AWS EMR

I'd like to include code from another file in another Jupyter .ipynb file on the AWS Elastic MapReduce platform.
But many of the methods I have seen online for including utility functions / common Python code that would seem to work outside of AWS, don't work inside the hosting environment for EMR-based Notebooks. I'm assuming this is file system security/server restrictions. If someone knows of an example of including code from a .py file and/or .ipynb file from the same directory that works on AWS I would love to see an example.
This method did not work for me. The find_notebook returns None.
https://jupyter-notebook.readthedocs.io/en/4.x/examples/Notebook/rstversions/Importing%20Notebooks.html
or this library / method
ipynb import another ipynb file
Is there an AWS "approved" or recommended way of including common Python/PySpark code into a Jupyter Notebook?
Note: this question
How to import from another ipynb file in EMR jupyter notebook which runs a PySpark kernel?
I have ssh'd into the master server and installed different packages like ipynb.
Didn't work even though the module installed fine. And the environment can see it. But the overall technique did not work.
Error:
An error was encountered:
"'name' not in globals"
Traceback (most recent call last):
KeyError: "'name' not in globals"

Is there a way to import and run functions from saved .py files in a Jupyter notebook running on a Google Cloud Platform dataproc cluster?

When running Jupyter notebook natively it is simple to import functions and utilities from a saved .py script.
When I work in a Jupyter notebook running on a Google Cloud Platform dataproc cluster and try the same thing- (after having uploaded a .py script to my dataproc Jupyter notebook- it is therefore in the cloud***) I am unable to import the function into the (dataproc) notebook.
Does anyone know how I can do this? Does it just have to do with figuring out the correct, but not obvious path? (I am trying to import a .py file from within the same folder as the Jupyter notebook, so if this were running natively it wouldn't require a path, but perhaps it is different with dataproc?
*** I am not making the mistake of trying to import a desktop/native .py script into a GC dataproc notebook.
Any help or leads would be very much appreciated!
If you are using PySpark kernel, you can add dependencies to sparkContext.
spark.sparkContext.addPyFile(f'gs://{your_bucket}/{path_to_file}/dependencies.zip')
Your dependencies.zip would contain a folder with all .py scripts and __init__.py:
dependencies/
├── __init__.py
└── my_script.py
You can then import all of your dependencies using
import dependencies
or import individual dependencies using
from dependencies.my_script import my_class
PS: Any changes to your dependencies.zip would not be reflected in your imports and you'll have to restart PySpark kernel to use the updated scripts.
Unfortunately this is not supported. But you can download the .py file then import, as a workaround - details can be found in the answer in a similar question:
Dataproc import python module stored in google cloud storage (gcs) bucket.

Python import error in Jupyter Notebook Pycharm 2019 IDE

I recently installed PyCharm 2019 professional edition in Windows 10. I created a new Project 'Sample' and two files 'file1.py' and 'file2.ipynb'. I have installed jupyter notebook from the chosen python interpreter.
class Foo is defined in file1.py
I then import file1.py into file2.ipynb to use Foo
Here, I encounter a strange error. 2019 professional edition has local notebook server installed in it.
from file1 import Foo
I ran the above code sample in a cell both in editor inside PyCharm IDE as well as in browser, after turning on the jupyter server on localhost.
For some reason, the code sample throws out import error problem in IDE but runs smoothly in browser. I tried looking if there are any issues with project path but couldn't figure out the reason yet. Anyone encountered this before ?
Here is the screenshot of Import Error:
os.getcwd() helped me understand the problem. The jupyter notebook editor is running under Main directory 'Sample'. Whereas my files are in 'Sample/Resource/file1.py', 'Sample/Resource/file2.ipynb' .
Now If I import using the following commands in file2.ipynb file, it worked:
import os
print(os.getcwd())
#%%
from Source import file1
print(file1.Foo())
Thank you very much for the help #Vishal #IonicSolutions

How to make a python file visible to python notebook?

I have a .py file I want to import into my notebook. The file is in the same directory as my .ipynb file. How to make it visible to that notebook (say files are named ./library.py and ./experiment.ipynb)?
Using Python 3 , you should be able to just do import .library or from . import library. More generally, you need the file to be on your $PYTHONPATH (it's unrelated to Jupyter/IPython it's a Python thing), and you should consider packaging your file it will probably take you 5 minutes to make it installable and redistributable.

Categories