Cannot import from a git submodule - python

I'm having difficulties making a git submodule work.
I have a project ProjectA that basically is a mainA.py file and a subfolder with library files:
The mainA.py contains a MainClass that is basically what should be called, and Libraries just contain scripts and classes for computations.
ProjectA/
Libraries/
__init__.py
library1.py
library2.py
__init__.py
mainA.py
In mainA.py I just do something like:
# content of mainA.py
from Libraries.library1 import ClassA, ClassB
class MainClass:
# do stuff
if __name__ == '__main__':
MainClass()
This just works fine, but I have now a ProjectB that needs to use the MainClass from ProjectA, so I decided to put ProjectA as a git submodule of ProjectB
git submodule add ProjectA_git_url
ProjectB/
ProjectA/
mainB.py
.gitmodules
However now in mainB.py I'm trying to import MainClass from projectA.
# content of mainB.py
from ProjectA.mainA import MainClass
ModuleNotFoundError: No module named 'Libraries'
I think this happens because now Libraries is no longer hanging from the root directory, but inside the submodule ProjectA, so when mainA.py does:
from Libraries.library1 import ClassA, ClassB
The system cannot find Libraries.
If I change mainA.py to do:
from ProjectA.Libraries.library1 import ClassA, ClassB
Then it works, but of course I don't want to change anything insise ProjectA, it is just a Project that should work either standalone or as a submodule of another project
What am I doing wrong? Is there a way to import MainClass from mainA.py when ProjectA is a submodule?

git is a development tool; you use it during development but not deployment. pip is a deployment tool; during development you use it to install necessary libraries; during deployment your users use it to install your package with dependencies.
Use submodules when you need something from a remote repository in your development environment. For example, if said remote repository contains Makefile(s) or other non-python files that you need and that usually aren't installed with pip.
That is, in your case you shouldn't make ProjectA a submodule of ProjectB, you should make ProjectA a Python dependency. Create packages for ProjectA and ProjectB and install them separately but allow ProjectB to import from ProjectA.
Dependencies are declared in setup.py or requirements.txt.
That said, if you insist on using submodules: either you have to manipulate sys.path yourself or you do relative import in mainA.py:
from .Libraries.library1 import ClassA, ClassB

Add submodule path to system path in mainB.py
Say your submodule path is "../ProjectB/ProjectA"
sys.path.append(../ProjectB/ProjectA) resolve the issue.

Related

Struggling with python's import mechanism

I am an experienced java enterprise developer but very new to python enterprise development shop. I am currently, struggling to understand why some imports work while others don't.
Some background: Our dev team recently upgraded python from 3.6 to 3.10.5 and following is our package structure
src/
bunch of files (dockerfile, Pipfile, requrirements.txt, shell scripts, etc)
package/
__init__.py
moduleA.py
subpackage1/
__init__.py
moduleX.py
moduleY.py
subpackage2/
__init__.py
moduleZ.py
tests/
__init__.py
test1.py
Now, inside the moduleA.py, I am trying to import subpackage2/moduleZ.py like so
from .subpackage2 import moduleZ
But, I get the error saying
ImportError: attempted relative import with no known parent package
The funny thing is that if I move moduleA.py out of package/ and into src/ then it is able to find everything. I am not sure why is this the case.
I run the moduleA.py by executiong python package/moduleA.py.
Now, I read that maybe there is a problem becasue you have you give a -m parameter if running a module as a script (or something on those lines). But, if I do that, I get the following error:
ModuleNotFoundError: No module names 'package/moduleA.py'
I even try running package1/moduleA and remove the .py, but that does not work either. I can understand why as I technically never installed it ?
All of this happened because apparently, the tests broke and to make it work they added relative imports. They changed the import from "from subpackage2 import moduleZ" to "from .subpackage2 import moduleZ" and the tests started working, but the app started failing.
Any understanding I can get would be much appreciated.
The -m parameter is used with the import name, not the path. So you'd use python3 -m package.moduleA (with . instead of /, and no .py), not python3 -m package/moduleA.py.
That said, it only works if package.moduleA is locatable from one of the roots in sys.path. Shy of installing the package, the simplest way to make it work is to ensure your working directory is src (so package exists in the working directory):
$ cd path/to/src
$ python3 -m package.moduleA
and, with your existing setup, if moduleA.py includes a from .subpackage2 import moduleZ, the import should work; Python knows package.moduleA is a module within package, so it can use a relative import to look for a sibling package to moduleA named subpackage2, and then inside it it can find moduleZ.
Obviously, this is brittle (it only works if you cd to the src root directory before running Python, or hack the path to src in PYTHONPATH, which is terrible hack if the code ever has to be run by anyone else); ideally you make this an installable package, install it (in global site-packages, user site-packages, or within a virtual environment created with the built-in venv module or the third-party virtualenv module), and then your working directory no longer matters (since the site-packages will be part of your sys.path automatically). For simple testing, as long as the working directory is correct (not sure what it was for you), and you use -m correctly (you were using it incorrectly), relative imports will work, but it's not the long term solution.
So first of all - the root importing directory is the directory from which you're running the main script.
This directory by default is the root for all imports from all scripts.
So if you're executing script from directory src you can do such imports:
from package.moduleA import *
from package.subpackage1.moduleX import *
But now in files moduleA and moduleX you need to make imports based on root folder. If you want to import something from module moduleY inside moduleX you need to do:
# this is inside moduleX
from package.subpackage1.moduleY import *
This is because python is looking for modules in specific locations.
First location is your root directory - directory from which you execute your main script.
Second location is directory with modules installed by PIP.
You can check all directories using following:
import sys
for p in sys.path:
print(p)
Now to solve your problem there are couple solutions.
The fast one but IMHO not the best one is to add all paths with submodules to sys.path - list variable with all directories where python is looking for modules.
new_path = "/path/to/application/app/folder/src/package/subpackage1"
if new_path not in sys.path:
sys.path.append(new_path)
Another solution is to use full path for imports in all package modules:
from package.subpackage1.moduleX import *
I think in your case it will be the correct solution.
You can also combine 2 solutions.
First add folders with subpackages to sys.path and use subpackage folders as a root folders for imports. But it's good solution only if you have complex submodule structure. And it's not the best solution if in future you will need to deploy your package as a wheel or share between multiple projects.

Git submodule's local import error - Python

I'm working on a Python project (Project A) that uses another project from GitHub (Project B). I'm not a Git expert, so after a quick research, I found out that I should use the Project B as a git submodule.
So, I cd project_A_root and did the following:
git submodule add project_B
git submodule init
git submodule update
Now, my project structure looks like this:
In main.py file, I've imported a method from do_something.py.
main.py
from ProjectB.do_something import foo
However, do_something.py file imports a method from util.py file, and that's where the problem occurs.
do_something.py
from util import bar
Project B is a submodule and it assumes that Project B dir is the root of the project, so method from util.py in do_something.py is imported without specifying the package, and I'm getting an error:
ImportError: cannot import name 'bar' from 'util'
Instead, it should be imported like this:
from ProjectB.util import bar
I'm not sure what is the best way to handle this.
I've fixed imports in submodule manually, but I cannot push that changes to Git because that's not how the submodules work, so if anyone wants to clone Project A, they must fix imports manually too.
Any help is welcome.
Try this in the head of main.py:
import sys
sys.path.append("ProjectB")
#### your old code ###
....
Git is just a version control system. Unfortunately, you can't handle this correctly.
The possible solution is patching sys.path variable by adding the ProjectB directory, but this is hack.
The best you can do is to use a Python packaging system, to package ProjectB into a pip package and install it as a usual package by pip.
Usefull links:
https://packaging.python.org/
https://python-packaging.readthedocs.io/en/latest/index.html
https://docs.python.org/3/distutils/setupscript.html
https://python-poetry.org/

Python folder structure for project directory and easy import

My team has a folder of several small projects in python3. Amongst them, we have a utility folder with several utility functions, that are used throughout the projects. But the way to import it is very uncomfortable. This is the structure we use:
temp_projects
util
storage.py
geometry.py
project1
project1.py
project2
project2.py
The problem is that the import in the projects looks terrible:
sys.path.insert(1, os.path.join(sys.path[0], '..'))
import util.geometry
util.geometry.rotate_coordinates(....)
Also, pycharm and other tools are having trouble understanding it and to supply completion.
Is there some neater way to do that?
Edit:
All of the projects and utils are very much work-in-progress and are modified often, so I'm looking for something as flexible and comfortable as possible
PYTHONPATH environment variable might be a way to go. Just set it to projects folder location:
PYTHONPATH=/somepath/temp_projects
and you'll be able to use util as follows:
import util.geometry
util.geometry.rotate_coordinates(....)
Also this will be recognized by PyCharm automatically.
I believe the correct route would be completely different than what you are doing right now. Each project should be stored in a different Git repository, share modules should be added as git submodules. Once those projects will get larger and more complex (and they probably will), it would be easier to manage them separately.
In short
Projects structure should be:
Project_1
|- utils <submodule>
|- storage.py
|- geometry.py
|- main.py
Project_2
|- utils <submodule>
|- storage.py
|- geometry.py
|- main.py
Working with submodules
### Adding a submodule to an existing git directory
git submodule add <git#github ...> <optional path/to/submodule>
### Pulling latest version from master branch for all submodules
git submodule update --recursive --remote
### Removing submodule from project
# Remove the submodule entry from .git/config
git submodule deinit -f path/to/submodule
# Remove the submodule directory from the project's .git/modules directory
rm -rf .git/modules/path/to/submodule
# Remove the entry in .gitmodules and remove the submodule directory located at path/to/submodule
git rm -f path/to/submodule
Further reading https://git-scm.com/book/en/v2/Git-Tools-Submodules
According to Importing files from different folder adding a __init__.py in the util folder will cause python to treat it as a package. Another thing you can do is use import util.geometry as geometry and then you could use geometry.rotate_coordinates(...) which also improves readability.
If you create a setup.py file for your util module you can install it simply with pip. It will handle everything for you. After installation you can import it across the system.
import util
pip install
# setup.py is in current folder
sudo -H pip3 install .
or if the util module itself is still under development you can install it with the -e editable option. Then it automatically updates the installation when you make changes to the code.
sudo -H pip3 install -e .
For the project administration I recommend to use git as #Michael liv. is suggesting, especially if you work in a team.
Use importlib.
import importlib, importlib.util
def module_from_file(module_name, file_path):
spec = importlib.util.spec_from_file_location(module_name, file_path)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return module
geometry = module_from_file("geometry", "../temp_projects/utils/geometry.py")
geometry.rotate_coordinates(...)
Other option (I used this appoach in my project):
Lets assume that projects are ran from project1.py and project2.py files accordingly.
At the top of these files you can add following imports and actions:
import sys
import os
sys.path.append(os.path.join(os.getcwd(), os.pardir))
import your_other_modules
your_other_modules.py will contains following for impotring utils
from utils import storage
from utils import geometry
# or from project2 import project2, etc..
May be it's not a best way, but just like one more option. I hope it will be helpful for someone.

python importing level 2 packages

I have added a git submodule in my project. Now all the imports in that submodule are broken because I have to use the full path of import.
For example, if the structure is like this:
Myproject:
- submodule_project:
-- package1:
--- code1.py
-- package2:
--- code2.py
Now, in code1.py there is from package2 import code2. It tells me that package2 is unresolved reference. It is only resolved if I change it to from submodule_project.package2 import code2.
I don't want this because I don't want to change anything in the submodule. I just added it to use some of its packages in my project and to get regularly updated whenever its developers update it.
If you want package2 to be a top-level importable package its parent directory (submodule_project in your case) has to be in sys.path. There are many way to do it: sys.path.append(), sys.path.insert(), PYTHONPATH environment variable.
Or may be you don't want to have the code as a submodule at all. It doesn't make sense to have a submodule if the code in the submodule uses absolute import instead of relative (from ..package2 import code2). May be the package should be installed in site-packages (global or in a virtual environment) but not attached to the project as a submodule.

Directory Structure for Importing Python Package using __init__.py

I've got a python project (projectA), which I've included as a git submodule in a separate project (projectB) as a subfolder. The structure is laid out as
projectB/
projectB_file.py
projectA/ (repository)
projectA/ (source code)
module1/
file1.py (contains Class1)
file2.py (contains Class2)
tests/
test_file1.py
I'm trying to figure out how to layout __init__.py files so in projectB_file.py I can import Class1 and Class2 as
from projectA.module1 import Class1
from projectA import Class2
I think having the top level projectA part of the import will be a mistake. That will require you to write your imports with projectA duplicated (e.g. import projectA.projectA.module1.file1).
Instead, you should add the top projectA folder to the module search path in some way (either as part of an install script for projectB, or with a setting in your IDE). That way, projectA as a top-level name will refer to the inner folder, which you actually intend to be the projectA package.
You'll need an __init__.py in each subdirectory you want to treat as a package. Which in your case means one in:
projectA
projectA/projectA
projectA/projectA/module1
projectA/projectA/tests
It would definitely be better you could lose that first projectA folder.
This is an annoying issue. The main approach I use is to edit the PYTHONPATH. This is essentially how I do it.
ProjectB/ # Main Repository
projectb/ # Package/library
__init__.py
projectB_file.py
docs/
tests/
setup.py # Setup file to distribute the library
freeze.py # CX_Freeze file to make executables
ProjectA/ # Subrepository
projecta/ # Package/library
__init__.py
projectA_file.py
file2.py
submodule/
__init__.py
file3.py
pa_module1/ # Additional package/library in 1 project
__init__.py
file1.py
docs/
tests/
setup.py
With this setup I have to edit the python path before running ProjectB's code. I use this structure, because it is easier for deployment and distribution. The subrepository helps track the version of ProjectA for each release of ProjectB.
cd ProjectB
export PYTHONPATH=${PWD}:${PWD}/ProjectA
python projectb/projectB_file.py
projectB_file.py
from projecta import projectA_file
This is for my development environment. For a release environment someone would install with the below code.
# Need a correct setup.py to handle 2 packages for pa_module1 and projecta
pip install ProjectA (install projecta to site-packages)
cd ..
pip install ProjectB (install projectb to site-packages)
This means projectb and projecta are in site packages. From there any file can simply
from projectb import projectB_file
from projecta import projectA_file, file2
from pa_module1 import file1
# Personally I don't like imports that use project.sub.sub.sub ...
# I have seen it cause import confusion and problems, but it is useful in some situations.
from projecta.submodule import file3
#import projecta.projectA_file
#import projecta # if __init__.py has import code
file1.Class1()
file2.Class2()
file3.Class3()
Additionally with pip you can install a library as a developer environment which links to the real project directory instead of copying files to site-packages.
pip install -e ProjectA

Categories