I have a python/pyspark project with the following structure:
project
__ini__.py
module1
__ini__.py
file1.py
file_run1.py
module2
__ini.py
file2.py
file_run2.py
shared_allmodules
__ini__.py
func1.py
func2.py
File_run1.py:
from shared_allmodules import func1, func2
from module1 import file1
File2.py:
from shared_allmodules import func2
I have thia structure in CDSW and it works there. But now i have to move all the files into a unix server and run from there.
But when i run
spark2-submit file_run1.py
From module1 directory i have an error that "no module named shared_allmodules".
I'm new in python/pyspark and i don't know what i have to do so that my submodules to be recognized in unix.
I don't have a main.py because i don't know how to use it.
Also i don't have the condition with if name=main.
My py files have a lot of pyspark code, i just wrote here a part of the directories structure.
Do you know what i have to do in order to run py files in unix that import modules from other directories?
You need to specify the environment variable PYTHONPATH which defines visible for the python interpreter directories (ones outside site-packages) or install your modules in the system using setuptools [1].
Example:
export PYTHONPATH=/path/to/project:$PYTHONPATH
Related
Assume I have the following file structure:
Package/
__init__.py
A.py
B.py
Inside __init__.py I have the following:
__init__.py/
import numpy as np
import pandas as pd
Then I issue the following in the A.py script:
A.py/
from Package import *
However, I receive an error message that no module name Package is defined.
ModuleNotFoundError: No module named Package
I thought from Package import * means running everything in the __init__.py.
I can run the A.py content and use the init import from B.py as I expected.(using from Package import *)
I am using VSCode and Anaconda and my OS is Windows 10.
I can append the project folder every time to my PythonPath using the following commands:
sys.path.append("Path to the Package")
But I do not want to run this piece of code every time.
Can anyone explain what is the problem?
Is this a new problem in Python since I do not recall having such issues in the past?
Because if you run the B.py, the Parent folder of Package folder will be added into the sys.path, be equal to you add sys.path.append("Path to the Package") in the A.py file.
But when you run the A.py, it will add the Package folder instead of the Parent folder of Package folder to the sys.path.
sys.path:
A list of strings that specifies the search path for modules.
Initialized from the environment variable PYTHONPATH, plus an
installation-dependent default.
As initialized upon program startup, the first item of this list,
path[0], is the directory containing the script that was used to
invoke the Python interpreter.
If you are running the python file in debug mode(F5), and the Package folder was the subfolder of your workspace, you can configure the PYTHONPATH in the launch.json file:
"env": {
"PYTHONPATH": "${workspaceFolder}"
},
In your A.py script use just use import file.py and do not use the star. Or, put all your files in a second Package2 directory and use from Package2 import * from your current A.py file.
I have a folder with a few python files (modules) that I want to be globally accessible on MacOS Catalina.
On Windows, I just moved the folder to the python path under Lib/site-packages/ but I can't seem to find a way to do that on MacOS.
Here is my file structure:
myfolder
- __init__.py
- file1.py
- file2.py
and I want to access those modules in my python scripts like this:
from myfolder.file1 import func
from myfolder.file2 import func2
I tried adding the folder to the path like this
# ~/.zprofile
export PATH=/Users/username/dev/myfolder
but when running the python script, this error would be thrown: ModuleNotFoundError: No module named 'myfolder'
the solution was to export the folder that contains that folder I want to be globally accessible (myfolder) as the PYTHONPATH:
# ~/.zprofile
export PYTHONPATH=/Users/myusername/dev
Here is the folder structure:
- dev
- myfolder
- __init__.py
- file1.py
- file2.py
and here is how I can access those modules in any python script on my machine:
from myfolder.file1 import func
from myfolder.file2 import func2
I am using python 2.7 and have the following project structure
main-folder
--folder1
- script.py
--folder2
- scr.py
-- abc.py
-- util.py
I am trying to import abc.py into util.py using
from main-folder import abc
but I am not getting error as below
ImportError: No module named main-folder
I also tried to append the path to main-folder to the path using
sys.path.append(r'path/to main-folder/main-folder')
I also have init.py in main-folder , folder1 & folder2
I'll assume your package is not actually called main-folder because that's a syntax error.
sys.path / PYTHONPATH is where Python looks for modules, so adding a folder to sys.path means what's in it can be imported (as a top-level module), it doesn't make the folder itself importable
when you run a script as a Python file, Python adds that file's folder on the PYTHONPATH e.g. here if you run main-folder/folder1/script.py, main-folder/folder1 is what's on your PYTHONPATH, and that obviously can't access abc or utils no matter how you slice it
import <foo> or from <foo> import <bar> is an absolute import, it starts its search from the PYTHONPATH[0]
you can specify PYTHONPATH on the command line, e.g. PYTHONPATH=. python main-folder/folder1/script.py will *also* add whatever .` is to your PYTHONPATH, which may be what you want?
within a pacakge (a directory with an __init__ and a bunch of submodules), it's probably better to use relative imports e.g. util should use from . import abc if they're supposed to be sibling submodules of the same package
[0] that's not actually true for Python 2, as PEP 328 necessarily had to keep the old behaviour working but you probably want to assume it regardless, you can "opt out" of the old behaviour by using the __future__ stanza listed in the PEP
I have the following directory structure:
app/
bin/
script1.py
script2.py
lib/
module1/
__init__.py
module1a.py
module1b.py
__init__.py
module2.py
Dockerfile
My problem is that I want to execute script1.py and script2.py, but inside those scripts, I want to import the modules in lib/.
I run my scripts from the root app/ directory (i.e. adjacent to Dockerfile) by simply executing python bin/script1.py. When I import modules into my scripts using from lib.module1 import module1a, I get ImportError: No module named lib.module1. When I try to import using relative imports, such as from ..lib.module1 import module1a, I get ValueError: Attempted relative import in non-package.
When I simply fire up the interpreter and run import lib.module1 or something, I have no issues.
How can I get this to work?
In general, you need __init__.py under app and bin, then you can do a relative import, but that expects a package
If you would structure your python code as python package (egg/wheel) then you could also define an entry point, that would become your /bin/ file post install.
here is an example of a package - https://python-packaging.readthedocs.io/en/latest/minimal.html
and this blog explains entry points quite well - https://chriswarrick.com/blog/2014/09/15/python-apps-the-right-way-entry_points-and-scripts/
if so, that way you could just do python setup.py install on your package and then have those entry points available within your PATH, as part of that you would start to structure your code in a way that would not create import issues.
You can add to the Python path at runtime in script1.py:
import sys
sys.path.insert(0, '/path/to/your/app/')
import lib.module1.module1a
you have to add current dir to python path.
use export in terminal
or sys.path.insert in your python script both are ok
Here we go with my first ever stackoverflow quesion. I did search for an answer, but couldn't find a clear one. Here's the situation. I've got a structure like this:
myapp
package/
__init.py__
main.py
mod1.py
mod2.py
Now, in this scenario, from main.py I am importing mod1.py, which also needs to be imported by mod2.py. Everything works fine, my imports look like this:
main.py:
from mod1 import Class1
mod2.py:
from mod1 import Class1
However, I need to move my main.py to the main folder structure, like this:
myapp
main.py
package/
__init.py__
mod1.py
mod2.py
And now what happens is that of course I need to change the way I import mod1 inside main.py:
from package.mod1 import Class1
However, what also happens is that in order not to get an "ImportError: No module named 'mod1'", I have make the same type of change inside mod2.py:
from package.mod1 import Class1
Why is that? mod2 is in the same folder/pakcage as mod1, so why - upon modifying main.py - am I expected to modify my import inside mod2?
The reason this is happening is because how python looks for modules and packages when you run a python script as the __main__ script.
When you run python main.py, python will add the parent directory of main.py to the pythonpath, meaning packages and modules within the directory will be importable. When you moved main.py, you changed the directory that was added to the pythonpath.
Generally, you don't want to rely on this mechanism for importing your modules, because it doesn't allow you to move your script and your package and modules are only importable when running that main script. What you should do is make sure your package is installed into a directory that is already in the pythonpath. There are several ways of doing this, but the most common is to create a setup.py script and actually install your python package for the python installation on your computer.