How to import a module into another module in databricks notebook? - python

This is my config.py in Databricks
DATA_S3_LOCATION='s3://server-data/data1'
DATA_S3_FILE_TYPE='orc'
DATA2_S3_LOCATION='s3://server-data/data2'
DATA2_S3_FILE_TYPE='orc'
I have init . py in this folder as well
I am trying to access these variables in another file
import sys
sys.path.insert(1,'/Users/file')
from file import config
I am facing error , no module named file

There are several aspects here.
If these files are notebooks, then you need to use %run ./config to include notebook from the current directory (doc)
if you're using Databricks Repos and arbitrary files support is enabled, then your code needs to be a Python file, not notebook, and have correct directory layout with __init__.py, etc. In this case, you can use Python imports. Your repository directory will be automatically added into a sys.path and you don't need to modify it.
P.S. I have an example of repository with both notebooks & Python files approaches.

Related

Is there a way to import python package correctly from external folder?

I have a python package folder, I am using regularly. As part of my work, I have multiple versions of this package folder.
The folder look something like this,
Main folder
scripts.ipynb
other stuff
package folder
init.py
all submodules folders
The init.py file consists of 'import package_name.submodule_name' lines. I have many main folders, like this one, for all the different versions. If I run the scripts.ipynb notebook, then it works correctly with the wanted package version.
But if I run the notebook, outside of the main_folder, it seems I can not import the package correctly. I want to have the ability to run single scripts notebook, where I can toggle and switch between the different main folders/packages.
Giving unique name for every package is very tedious work, because the name of the package is repeated numerous time along the files. I tried to append the python path to the system, or set the PYTHONPATH to look at the correct folder, but it doesn't help.
In the case of appending the python path to sys, the import routine goes to the correct init.py file, but inside the file it fails to import the submodules. When setting the PYTHONPATH to the folder, no error is given, but the functionality is incorrect (error at first call).

Is there a way to import and run functions from saved .py files in a Jupyter notebook running on a Google Cloud Platform dataproc cluster?

When running Jupyter notebook natively it is simple to import functions and utilities from a saved .py script.
When I work in a Jupyter notebook running on a Google Cloud Platform dataproc cluster and try the same thing- (after having uploaded a .py script to my dataproc Jupyter notebook- it is therefore in the cloud***) I am unable to import the function into the (dataproc) notebook.
Does anyone know how I can do this? Does it just have to do with figuring out the correct, but not obvious path? (I am trying to import a .py file from within the same folder as the Jupyter notebook, so if this were running natively it wouldn't require a path, but perhaps it is different with dataproc?
*** I am not making the mistake of trying to import a desktop/native .py script into a GC dataproc notebook.
Any help or leads would be very much appreciated!
If you are using PySpark kernel, you can add dependencies to sparkContext.
spark.sparkContext.addPyFile(f'gs://{your_bucket}/{path_to_file}/dependencies.zip')
Your dependencies.zip would contain a folder with all .py scripts and __init__.py:
dependencies/
├── __init__.py
└── my_script.py
You can then import all of your dependencies using
import dependencies
or import individual dependencies using
from dependencies.my_script import my_class
PS: Any changes to your dependencies.zip would not be reflected in your imports and you'll have to restart PySpark kernel to use the updated scripts.
Unfortunately this is not supported. But you can download the .py file then import, as a workaround - details can be found in the answer in a similar question:
Dataproc import python module stored in google cloud storage (gcs) bucket.

ModuleNotFoundError when running script from Terminal

I have the following folder structure:
app
__init__.py
utils
__init__.py
transform.py
products
__init__.py
fish.py
In fish.py I'm importing transform as following: import utils.transform.
When I'm running fish.py from Pycharm, it works perfectly fine. However when I am running fish.py from the Terminal, I am getting error ModuleNotFoundError: No module named 'utils'.
Command I use in Terminal: from app folder python products/fish.py.
I've already looked into the solutions suggested here: Importing files from different folder, adding a path to the application folder into the sys.path helps. However I am wondering if there is any other way of making it work without adding two lines of code into the fish.py. It's because I have many scripts in the /products directory, and do not want to add 2 lines of code into each of them.
I looked into some open source projects, and I saw many examples of importing modules from a parallel folder without adding anything into sys.path, e.g. here:
https://github.com/jakubroztocil/httpie/blob/master/httpie/plugins/builtin.py#L5
How to make it work for my project in the same way?
You probably want to run python -m products.fish. The difference between that and python products/fish.py is that the former is roughly equivalent to doing import products.fish in the shell (but with __name__ set to __main__), while the latter does not have awareness of its place in a package hierarchy.
This expands on #Mad Physicist's answer.
First, assuming app is itself a package (since you added __init__.py to it) and utils and products are its subpackages, you should change the import to import app.utils.transform, and run Python from the root directory (the parent of app). The rest of this answer assumes you've done this. (If it wasn't your intention making app the root package, tell me in a comment.)
The problem is that you're running app.products.fish as if it were a script, i.e. by giving the full path of the file to the python command:
python app/products/fish.py
This makes Python think this fish.py file is a standalone script that isn't part of any package. As defined in the docs (see here, under <script>), this means that Python will search for modules in the same directory as the script, i.e. app/products/:
If the script name refers directly to a Python file, the directory
containing that file is added to the start of sys.path, and the file
is executed as the __main__ module.
But of course, the app folder is not in app/products/, so it will throw an error if you try to import app or any subpackage (e.g. app.utils).
The correct way to start a script that is part of a package is to use the -m (module) switch (reference), which takes a module path as an argument and executes that module as a script (but keeping the current working directory as a module search path):
If this option is given, [...] the current directory
will be added to the start of sys.path.
So you should use the following to start your program:
python -m app.products.fish
Now when app.products.fish tries to import the app.utils.transform module, it will search for app in your current working directory (which contains the app/... tree) and succeed.
As a personal recommendation: don't put runnable scripts inside packages. Use packages only to store all the logic and functionality (functions, classes, constants, etc.) and write a separate script to run your application as you wish, putting it outside the package. This will save you from this kind of problems (including the double import trap), and has also the advantage that you can write several run configurations for the same package by just making a separate startup script for each.

How to import python file located in same subdirectory in a pycharm project

I have an input error in pycharm when debugging and running.
My project structure is rooted properly, etc./HW3/. so that HW3 is the root directory.
I have a subfolder in HW3, util, and a file, util/util.py. I have another file in util called run_tests.py.
In run_tests.py, I have the following import structure,
from util.util import my_functions, etc.
This yields an input error, from util.util import load_dataset,proportionate_sample
ImportError: No module named 'util.util'; 'util' is not a package
However, in the exact same project, in another directory (same level as util) called data, I have a file data/data_prep.py, which also imports functions from util/util.py using a similar import statement...and it runs without any problems.
Obviously, I am doing this in the course of doing a homework, so please understand: this is ancillary to the scope of the homework.
The problem goes away when I move the file to another directory. So I guess this question is How do I import a python file located in the same directory in a pycharm project? Because pycharm raises an error if I just do import util and prompts me to use the full name from the root.
Recommended Way:
Make sure to set the working folder as Sources.
You can do it in Pycharm -> Preferences -> Project: XYZ -> Project Structure
Select your working folder and mark it as Sources. Then Pycharm recognize the working folder as a Source folder for the project and you will be able to simply add other files within that folder by using
import filename.py
or
from filename.py import mudule1
=================
Not recommended way:
In Pycharmyou can simply add . before the .py file which you are going to import it from the same folder. In your case it will be
from .util import my_functions
Resource
There is a good reference also for more information with example how to implement Package Relative Imports. I would highly recommend to check this page.
Package Relative Imports
If you don't have an __init__.py create one and add this line
from util.util import my_function
then you can easily import the module in your scripts
the __init__.py tells python that it should treat that folder as a python package, it can also be used to import/load modules too.
in most cases the __init__.py is empty.
Quoting the docs:
The __init__.py files are required to make Python treat the
directories as containing packages; this is done to prevent
directories with a common name, such as string, from unintentionally
hiding valid modules that occur later on the module search path. In
the simplest case, __init__.py can just be an empty file, but it can
also execute initialization code for the package or set the __all__
variable, described later.
Right-click on the folder which you want to be marked as the source > Mark Directory as > Source root.
In my case, it worked only when I omit the extension. Example:
import filename
Note: May be a bit unrelated.
I was facing the same issue but I was unable to import a module in the same directory (rather than subdirectory as asked by OP) when running a jupyter notebook (here the directory didn't have __init__.py). Strangely, I had setup python path and interpreter location and everything. None of the other answers helped but changing the directory in python did.
import os
os.chdir(/path/to/your/directory/)
I'm using PyCharm 2017.3 on Ubuntu 16.04
I had the same issue with pycharm, but the actual mistake was that the file I was trying to import didn't have a .py extension, even though I was able to run it as a standalone script. Look in the explorer window and make sure it has a .py extension. If not, right click on the file in the explorer window, pick refactor, and then rename it with a .py extension.
In Pycharm go to "Run - Configuration" and uncheck
'Add Content root to Pythonpath' and
'Add source roots to Pythonpath',
then use
from filename import functionname
For me the issue was, the source directory was marked correctly, but my file to import was named starting with numeric value. Resolved by renaming it.
import 01_MyModuleToImport
to
import MyModuleToImport

Import own .py files in anaconda spyder

I've written my own mail.py module in spider (anaconda). I want to import this py file in other python (spider) files just by 'import mail'
I searched on the internet and couldn't find a clearly solution.
To import any python script, it should exist in the PYTHONPATH. You can check this with the following code:
import sys
print sys.path
To import your Python script:
Put both the scripts (main and the imported python script) in the
same directory.
Add the location of the file to be imported to the
sys.path.
For example, if the script is located as '/location/to/file/script.py':
import sys
sys.path.append('/location/to/file/')
import script
I had the same problem, my files were in same folder, yet it was throwing an error while importing the "to_be_imported_file.py".
I had to run the "to_be_imported_file.py" seperately before importing it to another file.
I hope it works for you too.
Searched for an answer for this question to. To use a .py file as a import module from your main folder, you need to place both files in one folder or append a path to the location. If you storage both files in one folder, then check the working directory in the upper right corner of the spyder interface. Because of the wrong working directory you will see a ModuleNotFoundError.
There are many options, e.g.
Place the mail.py file alongside the other python files (this works because the current working dir is on the PYTHONPATH.
Save a copy of mail.py in the anaconda python environment's "/lib/site-packages" folder so it will be available for any python script using that environment.
I did a slightly different solution approach that is less sophisticated. When I start my anaconda terminal it is at a C prompt. I just did a cd d:\mypython\lib in the beginning window before starting python. once I did that I could simply just import my own classes that I put in that library with "import MyClass as my" then I was off and running. It is interesting, I did 2 days of internet searching in my part time and could not find the answer either, until I asked a friend.
cd d:\mypython\lib
python
>>> import MyClass as my
>>> my1=my.MyClass()
>>> my1.doSomething()
worked for me on my anaconda / windows 10 environment python 3.6.6.
when calling any function from another file, it should be noted to not import any library inside the function
I believe the easiest solution is to place the directory containing your Python files into the Anaconda site-packages folder on your machine. I wrote an article outlining the whole process but, in short, you'll need to create a folder containing your python script and an __init__.py file. Then place that folder inside the site-packages folder in the Anaconda directory.
On Windows the site-packages directory is typically located at:
C:\Users\[your_username]\Anaconda3\Lib\site-packages\
On Mac the site-packages directory is typically located located at:
Users/[your_username]/opt/anaconda3/lib/[python3.8]/site-packages/
Notice, on Mac the Python version matters. You'll need to look for the directory that corresponds to the (base) Python version used by Anaconda. Also, anything I've placed inside of square brackets in the file paths above need to be changed according to your particular machine and Python version.
The file structure should look something like this:
~/
|__site-packages/
|__your_folder/
script.py
__init__.py
After you have the folder containing your script.py and __init__.py file moved into the site-packages subdirectory of Anaconda, you'll be able to import it from any script you run on your machine.

Categories