I have a project I am working on, let's call it Project, which lives in the directory Project somewhere wholly unknown to me (really it lives both on my local system and on a couple Docker build systems). In that project, I have some source files, source/module1.py and source/module2.py. I also have some example files, some test files, and an init.py So my directory looks something like this:
Project
__init__.py
/source
module1.py
module2.py
/test
testRunner.py
/examples
awesomeExample.py
However, module1 needs some stuff from module2. My naive self thought this could be done by putting an import statement in module1:
import module2
# Do some other interesting stuff
And this works, but only when I am running / importing module 1 from the source directory. If I am, for example, running some unit tests in another directory test/testRunner.py, either from the test directory or in the main Project directory, the import will fail. Same with trying to use it when running an example in the examples directory.
So here is my problem: in general, I don't know where the calling script lives. It might be in the examples directory, it might be in the test directory, or it might be in the main Project directory (for example when trying to import stuff with an init.py). How do I ensure that module1 can always import module2 in each of these scenarios?
I am not looking for a solution like "add all those directories to your python path". Initially I just added Project to my python path on my local machine, and then did all my imports relative to that (import Project.source.module2), but this (predictably) caused my builds to fail on the Docker instances. I don't just want this to work on my local machine, but also on the Docker instances I'm using to build and test this software, and on any user's machine that subsequently installs it (i.e. by doing a pip install Project. What is the most robust way to make sure this dependency is satisfied? How can I make sure module1 can import module2 regardless of where module1 itself is imported from? Any python 3.x.x solution is welcome.
I figured out a way to do it (credit here) - it's a little inelegant, but extremely robust. Works on my local machine independent of whether using an import statement or running a script directly, as well as my build servers which use github actions and Travis CI.
Basically, I added a file in the source directory, called context.py with the following contents:
import os
import sys
fileLocation = os.path.dirname(os.path.abspath(__file__))
sourceLocation = os.path.abspath(os.path.join(fileLocation, '..', 'source/'))
sys.path.insert(0, sourceLocation)
This finds out the current file being executed from python, and then uses that to add to the python path. And then in my module1.py file, at the top I have:
import context
import module2
Now, whenever module1 is imported, it successfully imports module2. More elegant answers or comments on why this works and in which cases it might fail are appreciated.
Related
I am an experienced java enterprise developer but very new to python enterprise development shop. I am currently, struggling to understand why some imports work while others don't.
Some background: Our dev team recently upgraded python from 3.6 to 3.10.5 and following is our package structure
src/
bunch of files (dockerfile, Pipfile, requrirements.txt, shell scripts, etc)
package/
__init__.py
moduleA.py
subpackage1/
__init__.py
moduleX.py
moduleY.py
subpackage2/
__init__.py
moduleZ.py
tests/
__init__.py
test1.py
Now, inside the moduleA.py, I am trying to import subpackage2/moduleZ.py like so
from .subpackage2 import moduleZ
But, I get the error saying
ImportError: attempted relative import with no known parent package
The funny thing is that if I move moduleA.py out of package/ and into src/ then it is able to find everything. I am not sure why is this the case.
I run the moduleA.py by executiong python package/moduleA.py.
Now, I read that maybe there is a problem becasue you have you give a -m parameter if running a module as a script (or something on those lines). But, if I do that, I get the following error:
ModuleNotFoundError: No module names 'package/moduleA.py'
I even try running package1/moduleA and remove the .py, but that does not work either. I can understand why as I technically never installed it ?
All of this happened because apparently, the tests broke and to make it work they added relative imports. They changed the import from "from subpackage2 import moduleZ" to "from .subpackage2 import moduleZ" and the tests started working, but the app started failing.
Any understanding I can get would be much appreciated.
The -m parameter is used with the import name, not the path. So you'd use python3 -m package.moduleA (with . instead of /, and no .py), not python3 -m package/moduleA.py.
That said, it only works if package.moduleA is locatable from one of the roots in sys.path. Shy of installing the package, the simplest way to make it work is to ensure your working directory is src (so package exists in the working directory):
$ cd path/to/src
$ python3 -m package.moduleA
and, with your existing setup, if moduleA.py includes a from .subpackage2 import moduleZ, the import should work; Python knows package.moduleA is a module within package, so it can use a relative import to look for a sibling package to moduleA named subpackage2, and then inside it it can find moduleZ.
Obviously, this is brittle (it only works if you cd to the src root directory before running Python, or hack the path to src in PYTHONPATH, which is terrible hack if the code ever has to be run by anyone else); ideally you make this an installable package, install it (in global site-packages, user site-packages, or within a virtual environment created with the built-in venv module or the third-party virtualenv module), and then your working directory no longer matters (since the site-packages will be part of your sys.path automatically). For simple testing, as long as the working directory is correct (not sure what it was for you), and you use -m correctly (you were using it incorrectly), relative imports will work, but it's not the long term solution.
So first of all - the root importing directory is the directory from which you're running the main script.
This directory by default is the root for all imports from all scripts.
So if you're executing script from directory src you can do such imports:
from package.moduleA import *
from package.subpackage1.moduleX import *
But now in files moduleA and moduleX you need to make imports based on root folder. If you want to import something from module moduleY inside moduleX you need to do:
# this is inside moduleX
from package.subpackage1.moduleY import *
This is because python is looking for modules in specific locations.
First location is your root directory - directory from which you execute your main script.
Second location is directory with modules installed by PIP.
You can check all directories using following:
import sys
for p in sys.path:
print(p)
Now to solve your problem there are couple solutions.
The fast one but IMHO not the best one is to add all paths with submodules to sys.path - list variable with all directories where python is looking for modules.
new_path = "/path/to/application/app/folder/src/package/subpackage1"
if new_path not in sys.path:
sys.path.append(new_path)
Another solution is to use full path for imports in all package modules:
from package.subpackage1.moduleX import *
I think in your case it will be the correct solution.
You can also combine 2 solutions.
First add folders with subpackages to sys.path and use subpackage folders as a root folders for imports. But it's good solution only if you have complex submodule structure. And it's not the best solution if in future you will need to deploy your package as a wheel or share between multiple projects.
I've got a project running on a server with the structure
proj
__init__.py
module_a.py
module_b.py
main.py
And in the header of main.py, I import from other modules with the format
from .module_a import func1
from .module_b import func2
This runs fine on the server, but when I'm testing things on my local machine it raises the error:
ModuleNotFoundError: No module named '__main__.module_a'; '__main__' is not a package
There have been a lot of questions asked regarding this error and the accepted solution is almost always to replace the import statement with
from proj.module_a import func1
Is there something I can do to configure my local environment to allow this type of syntax without having a completely different set of import statements depending on whether the code is running locally or remotely?
Keep your imports relative, without using the package full path, so that you have the flexibility of renaming it as you wish, like in
from .module_a import func1
Then in your local environment, change your current dir to the proj parent folder and run:
python -m proj.main
An alternative would be to rename main.py to __main__.py and then just writing
python -m proj
will do. But that may affect the behaviour on the server if you copy the files as is.
Packages are usually to be imported. This is a common problem when we start running from arbitrary scripts located inside the package (in this case main.py). If the package is simply imported from outside, everything works.
I have a python project structured like this:
repo_dir/
----project_package/
--------__init__.py
--------process.py
--------config.py
----tests/
--------test_process.py
__init__.py is empty
config.py looks like this:
name = 'brian'
USAGE
I use the library by running python process.py from the project/project/ directory, or by specifying the python file path absolutely. I'm running Python 2.7 on Amazon EC2 Linux.
When process.py looks like below, everything works fine and process.py prints brian.
import config
print config.name
When process.py looks like below, I get the error ImportError: No module named project.config.
import project.config
print config.name
When process.py looks like below, I get the error ImportError: No module named project. This makes sense as the same behavior from the previous example should be expected.
from project import config
print config.name
If I add these lines to process.py to include the library root in sys.path, all configurations above, work fine.
import os
import sys
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
MY CONFUSION
Many resources suggest setting up python libraries to import modules using project.module_name, but it doesn't seem like sys.path appending is standard, and seems weird that I need it. I can see that the sys.path append added my library root as a path in sys, but I thought that's what the __init__.py in my library root was supposed to do. What gives? What am I missing? I know Python importing creates lots of headaches so I've tried to simplify this as much as possible to wrap my head around it. I'm going crazy and it's Friday before a holiday. I'm bummed. Please help!!
QUESTIONS
How should I set up my libraries? How should I import packages? Where should I have __init__.py files? Do I need to append my library root to sys.path in every project? Why is this so confusing?
Your project setup is alright. I renamed the directories just for clarity
in this example, but the structure is the same as yours:
repo_dir/
project_package/
__init__.py
process.py
config.py
# Declare your project in a setup.py file, so that
# it will be installable, both by users and by you.
setup.py
When you have a module that wants to import from another module in
the same project, the best approach is to use relative imports. For example:
# In process.py
from .config import name
...
While working on the code on your dev box, do your work in a Python virtualenv,
and pip install your project in "editable" mode.
# From the root of your repo:
pip install -e .
With that approach, you'll never need to muck around with sys.path -- which
is almost always the wrong approach.
I think the problem is how you're running your script. If you want the script to be living in a package (the inner project folder), you should run it with python -m project.process, rather than by filename. Then you can make absolute or explicit relative imports to get config from process.
An absolute import would be from project import config or import project.config.
An explicit relative import would be from . import config.
Python 2 also allows implicit relative imports, but they're a really bad misfeature that you should never use. With implicit relative imports, internal package modules can shadow top-level modules. For instance, a project/json.py file would hide the standard library's json module from all the other modules in the package. You can tell Python you want to forbid implicit relative imports by putting from __future__ import absolute_import at the top of the file. It's the standard behavior in Python 3.
I'm setting up some code for unittesting. My directory currently looks like this:
project/
src/
__init__.py
sources.py
test/
__init__.py
sources_test.py
In __init__.py for the test directory, I have these two lines:
import sys
sys.path.insert(0, '../')
In the test files, I have the line import src.sources.
When I use nose to run these tests from the project directory, everything works just fine. If I try to run the tests individually it gives me this error:
ImportError: No module named src.sources
I assume that this is because when I run the test from the command line it isn't using __init__.py. Is there a way I can make sure that it will use those lines even when I try to run the tests individually?
I could take the lines out of __init__.py and put them into my test files, but I'm trying to avoid doing that.
To run the tests individually I am running python sources_test.py
You're really trying to abuse packages here, and that isn't a good idea.
The simple solution is to not run the tests from within the tests directory. Just cd up a level, then do python tests/sources_test.py.
Of course that in itself isn't going to import test/__init__.py. For that, you really need to import the package. So python -m tests.sources_test is probably a better idea… except, of course, that if your package is made to be run as a script but not to be imported, that won't work.
Alternatively, you could (on POSIX platforms, at least) do PYTHONPATH=.. python sources_test.py from within tests. This is a bit hacky, but it should work.
Or, better, combine the above, and, from outside of tests, do PYTHONPATH=. python tests/sources_test.py.
A really hacky workaround is to explicitly import __init__. This should basically work for you simple use case, but everything ends up wrong—in particular, you end up with a module named __init__ instead of one named test, and of course your main module isn't named test.sources_test, and in fact there is no test package at all. Unless you accidentally re-import anything after modifying sys.path, in which case you may get duplicates of the modules.
If you write
import src.source
the python interpreter looks into the src directory for a __init__.py file. If it exists, you can use the directory as a package name. If your are not in your project directory, which is the case when you are in the src directory, then python looks into the directories in $PYTHONPATH environment variable (at least in linux, windows should also have some environment variable, maybe with another name), if it can find some directory src with a __init__.py file in it.
Did you set your $PYTHONPATH?
I have two directories in my project:
project/
src/
scripts/
"src" contains my polished code, and "scripts" contains one-off Python scripts.
I would like all the scripts to have "../src" added to their sys.path, so that they can access the modules under the "src" tree. One way to do this is to write a scripts/__init__.py file, with the contents:
scripts/__init__.py:
import sys
sys.path.append("../src")
This works, but has the unwanted side-effect of putting all of my scripts in a package called "scripts". Is there some other way to get all my scripts to automatically call the above initialization code?
I could just edit the PYTHONPATH environment variable in my .bashrc, but I want my scripts to work out-of-the-box, without requiring the user to fiddle with PYTHONPATH. Also, I don't like having to make account-wide changes just to accommodate this one project.
Even if you have other plans for distribution, it might be worth putting together a basic setup.py in your src folder. That way, you can run setup.py develop to have distutils put a link to your code onto your default path (meaning any changes you make will be reflected in-place without having to "reinstall", and all modules will "just work," no matter where your scripts are). It'd be a one-time step, but that's still one more step than zero, so it depends on whether that's more trouble than updating .bashrc. If you use pip, the equivalent would be pip install -e /path/to/src.
The more-robust solution--especially if you're going to be mirroring/versioning these scripts on several developers' machines--is to do your development work inside a controlled virtual environment. It turns out virtualenv even has built-in support for making your own bootstrap customizations. It seems like you'd just need an after_install() hook to either tweak sitecustomize, run pip install -e, or add a plain .pth file to site-packages. The custom bootstrap could live in your source control along with the other scripts, and would need to be run once for each developer's setup. You'd also have the normal benefits of using virtualenv (explicit dependency versioning, isolation from system-wide configuration, and standardization between disparate machines, to name a few).
If you really don't want to have any setup steps whatsoever and are willing to only run these scripts from inside the 'project' directory, then you could plop in an __init__.py as such:
project/
src/
some_module.py
scripts/
__init__.py # special "magic"
some_script.py
And these are what your files could look like:
# file: project/src/some_module.py
print("importing %r" % __name__)
def some_function():
print("called some_function() inside %s" % __name__)
--------------------------------------------------------
# file: project/scripts/some_script.py
import some_module
if __name__ == '__main__':
some_module.some_function()
--------------------------------------------------------
# file: project/scripts/__init__.py
import sys
from os.path import dirname, abspath, join
print("doing magic!")
sys.path.insert(0, join(dirname(dirname(abspath(__file__))), 'src'))
Then you'd have to run your scripts like so:
[~/project] $ python -m scripts.some_script
doing magic!
importing 'some_module'
called some_function() inside some_module
Beware! The scripts can only be called like this from inside project/:
[~/otherdir] $ python -m scripts.some_script
ImportError: no module named scripts
To enable that, you're back to editing .bashrc, or using one of the options above. The last option should really be a last resort; as #Simon said, you're really fighting the language at that point.
If you want your scripts to be runnable (I assume from the command line), they have to be on the path somewhere.
Something sounds odd about what you're trying to do though. Can you show us an example of exactly what you're trying to accomplish?
You can add a file called 'pathHack.py' in the project dir and put something like this into it:
import os
import sys
pkgDir = os.path.dirname(__file__)
sys.path.insert(os.path.join(pkgDir, 'scripts')
Then, in a python file in your project dir, start by:
import pathHack
And now you can import stuff from the scripts dir without the 'scripts.' prefix. If you have only one file in this directory, and you don't care about hiding this kind of thing, you may inline this snippet.