I'd like to have my tests on a module of its own that could be imported, but keeping them outside the src folder and at the same time sharing the namespace with the application module (something like my-module.test.unit) Is it even possible? Am I complicating it too much?
Directory structure:
root/
src/
my-module/
test/
end-to-end/
unit/
helper.py
test_foo.py
setup.py
setup.py
setup(
# ...
'packages'=['my-module'],
'package_dir'={'': 'src'}
# ...
)
Possible options I've figured:
Python's Namespace Packages
Change setup.py:
setup(
# ...
'packages'=['my-module', 'my-module.test.unit'],
'package_dir'={
'my-module': 'src', # doesn't work in editable mode
'my-module.test.unit': 'test/unit'
}
# ...
)
Create another setup.py inside the test folder
The rationale is that I have some helper modules in the same folder than the tests (e.g. test/unit/helper.py) and they are imported using absolute imports (e.g. from helper import Helper). There's nothing necessarily wrong in this setup because it works, but hypothetically a new built-in module could shadow any of mine because they take precedence. I believe that relative imports are better because there is no room left for interpretations (explicit is better than implicit). AFAIK, if my tests were a package by themselves I could bring that modules using relative imports (e.g. from .helper import Helper).
In the end it's more a matter of good practices and correctness than a real issue and because of that practicality beats purity also applies ;)
References on this:
Importing from builtin library when module with same name exists
PEP 0328 - Rationale for Relative Imports
Related
dirFoo1\
Foo1.py
lib\
bar1.py
dirFoo3
Foo.cpp
dirFoo2
Foo2.py
lib\
bar2.py
Foo1 snippet
# Foo1.py
import lib.bar1
Class bar1_class:
....
Foo2 snippet
# Foo2.py
sys.path.append("../dirFoo1/lib")
sys.path.append("../lib") # tried removing this too
import Foo1
....
Error message in file Foo2.py:
File <path_for_Foo2>
import Foo1 File <path_for_Foo1>
from lib.bar1 import bar1_class
ImportError: No module named bar1
When trying to use the library from dirFoo1\Foo1.py in dirFoo2\Foo2.py. Both Foo1 and Foo2 has its own imports from its sub directory lib.
When importing Foo1 in Foo2 with the sys path set,
sys.path.append(dirFoo1)
import error shows up on code in Foo1
import lib.something_from_bar1
reports error saying something_from_bar1 doesnt exist.
seems like its referring to lib in dirFoo2.
I do have appropriate __init__.py in places under dirFoo1 and dirFoo2.
Is there a way to import the library from another folder that is not linked with __init__.py ?
If both of your dirFooX folders are on the Python module search path, you will have a conflict between the two lib folders, which each define a different lib package. Only one of them will be accessible to your code. There are a few different ways you can resolve the conflict.
The simplest would be to rename one or both of the lib folders, and update the import statements to match. The name lib is not very descriptive for a package that is accessible at the top level of the module namespace. Perhaps they should be Foo1Lib and Foo2Lib.
Another option is to make dirFoo1 and dirFoo2 into packages, so that the lib packages are subpackages in two different namespaces. This is probably the best approach, since it shouldn't require any messing around with sys.path to make the imports work. Just make sure you run your script from the top level folder (above the dirFooX folders, and use the -m flag to run python -m dirFoo2.Foo2 rather than python dirFoo2/Foo2.py. You'll need to update the imports everywhere to use either absolute names (from dirFoo1.lib.bar1 import bar_class) or explicit relative names (from .lib.bar1 import bar_class).
My final option is probably not really an appropriate choice for your situation, but it would work. Starting in Python 3.3, you can have "namespace" packages, which can load modules from multiple folders rather than just one. If lib was a namespace package, it would allow the bar1.py and bar2.py files to both show up as part of the lib package inside the interpreter. This is primarily intended for big projects like Django which can have many extra features added on to them by third-party modules. Using namespace packages allows all those add-on modules to be put into a shared namespace. You could make lib a namespace package if you really want to (though as I said above, lib is so generic that it's probably a bad idea to use that name at the top level). All you'd need to do is to delete the __init__.py files in each lib folder. Like I said though, this is mostly intended for large projects where many modules that should be part of a single package are being written and distributed by separate authors. Unless your two lib folders are very closely related, it's probably not an appropriate choice for your situation.
My Directory structure in /VM/repo/project is:
__init__.py
scripts/
getSomething.py
__init__.py
classes/
project.py
db.py
__init__.py
getSomething.py
from ..classes import project
from ..classes import db
project.py
class PROJECT:
def __init__(self):
stuff
db.py
class DB:
def __init__(self):
stuff
When I try to run
python getSomething.py
I get the error
Traceback (most recent call last):
File "scripts/getSomething.py", line 4, in < module >
from ..classes import project
ValueError: Attempted relative import in non-package
What am I missing here?
As stated in the error, you're running getSomething as a main module. But you can't do package relative imports when you aren't in a package. The main module is never in a package. So, if you were to import getSomething as part of a package...:
# /VM/repo/main.py
from project.scripts import getSomething
Then you would not have the import errors.
Perhaps it is helpful to have a quick discussion on python modules and packages. In general, a file that contains python source code and has the .py extension is a module. Typically that module's name is the name of the file (sans extension), but if you run it directly, that module's name is '__main__'. So far, this is all well known and documented. To import a module you just do import module or import package.module and so on. This last import statement refers to something else (a "package") which we'll talk about now...
Packages are directories that you can import. Many directories can't be imported (e.g. maybe they have no python source files -- modules -- in them). So to resolve this ambiguity, there is also the requirement that the directory has an __init__.py file. When you import a directory on your filesystem, python actually imports the __init__.py module and creates the associated package from the things in __init__.py (if there are any).
Putting all of this together shows why executing a file inside a directory that has an __init__.py is not enough for python to consider the module to be part of a package. First, the name of the module is __main__, not package.filename_sans_extension. Second, the creation of a package depends not just on the filesystem structure, but on whether the directory (and therefore __init__.py was actually imported).
You might be asking yourself "Why did they design it this way?" Indeed, I've asked myself that same question on occasion. I think that the reason is because the language designers want certain guarantees to be in place for a package. A package should be a unit of things which are designed to work together for a specific purpose. It shouldn't be a silo of scripts that by virtue of a few haphazard __init__.py files gain the ability to walk around the filesystem to find the other modules that they need. If something is designed to be run as a main module, then it probably shouldn't be part of the package. Rather it should be a separate module that imports the package and relies on it.
I am just starting with python and have troubles understanding the searching path for intra-package module loads. I have a structure like this:
top/ Top-level package
__init__.py Initialize the top package
src/ Subpackage for source files
__init__.py
pkg1/ Source subpackage 1
__init__.py
mod1_1.py
mod1_2.py
...
pkg2/ Source subpackage 2
__init__.py
mod2_1.py
mod2_2.py
...
...
test/ Subpackage for unit testing
__init__.py
pkg1Test/ Tests for subpackage1
__init__.py
testSuite1_1.py
testSuite1_2.py
...
pkg2Test/ Tests for subpackage2
__init__.py
testSuite2_1.py
testSuite2_2.py
...
...
In testSuite1_1 I need to import module mod1_1.py (and so on). What import statement should I use?
Python's official tutorial (at docs.python.org, sec 6.4.2) says:
"If the imported module is not found in the current package (the package of which the current module is a submodule), the import statement looks for a top-level module with the given name."
I took this to mean that I could use (from within testSuite1_1.py):
from src.pkg1 import mod1_1
or
import src.pkg1.mod1_1
neither works. I read several answers to similar questions here, but could not find a solution.
Edit: I changed the module names to follow Python's naming conventions. But I still cannot get this simple example to work.
The module name doesn't include the .py extension. Also, in your example, the top-level module is actually named top. And finally, hyphens aren't legal for names in python, I'd suggest replacing them with underscores. Then try:
from top.src.pkg1 import mod1_1
Problem solved with the help of http://legacy.python.org/doc/essays/packages.html (referred to in a similar question). The key point (perhpas obvious to more experienced python developers) is this:
"In order for a Python program to use a package, the package must be findable by the import statement. In other words, the package must be a subdirectory of a directory that is on sys.path. [...] the easiest way to ensure that a package was on sys.path was to either install it in the standard library or to have users extend sys.path by setting their $PYTHONPATH shell environment variable"
Adding the path to "top" to PYTHONPATH solved the problem.To make the solution portable (this is a personal project, but I need to share it across several machines), I guess having a minimal initialization code in top/setup.py should work.
I am developing several Python projects for several customers at the same time. A simplified version of my project folder structure looks something like this:
/path/
to/
projects/
cust1/
proj1/
pack1/
__init__.py
mod1.py
proj2/
pack2/
__init__.py
mod2.py
cust2/
proj3/
pack3/
__init__.py
mod3.py
When I for example want to use functionality from proj1, I extend sys.path by /path/to/projects/cust1/proj1 (e.g. by setting PYTHONPATH or adding a .pth file to the site_packages folder or even modifying sys.path directly) and then import the module like this:
>>> from pack1.mod1 import something
As I work on more projects, it happens that different projects have identical package names:
/path/
to/
projects/
cust3/
proj4/
pack1/ <-- same package name as in cust1/proj1 above
__init__.py
mod4.py
If I now simply extend sys.path by /path/to/projects/cust3/proj4, I still can import from proj1, but not from proj4:
>>> from pack1.mod1 import something
>>> from pack1.mod4 import something_else
ImportError: No module named mod4
I think the reason why the second import fails is that Python only searches the first folder in sys.path where it finds a pack1 package and gives up if it does not find the mod4 module in there. I've asked about this in an earlier question, see import python modules with the same name, but the internal details are still unclear to me.
Anyway, the obvious solution is to add another layer of namespace qualification by turning project directories into super packages: Add __init__.py files to each proj* folder and remove these folders from the lines by which sys.path is extended, e.g.
$ export PYTHONPATH=/path/to/projects/cust1:/path/to/projects/cust3
$ touch /path/to/projects/cust1/proj1/__init__.py
$ touch /path/to/projects/cust3/proj4/__init__.py
$ python
>>> from proj1.pack1.mod1 import something
>>> from proj4.pack1.mod4 import something_else
Now I am running into a situation where different projects for different customers have the same name, e.g.
/path/
to/
projects/
cust3/
proj1/ <-- same project name as for cust1 above
__init__.py
pack4/
__init__.py
mod4.py
Trying to import from mod4 does not work anymore for the same reason as before:
>>> from proj1.pack4.mod4 import yet_something_else
ImportError: No module named pack4.mod4
Following the same approach that solved this problem before, I would add yet another package / namespace layer and turn customer folders into super super packages.
However, this clashes with other requirements I have to my project folder structure, e.g.
Development / Release structure to maintain several code lines
other kinds of source code like e.g. JavaScript, SQL, etc.
other files than source files like e.g. documents or data.
A less simplified, more real-world depiction of some project folders looks like this:
/path/
to/
projects/
cust1/
proj1/
Development/
code/
javascript/
...
python/
pack1/
__init__.py
mod1.py
doc/
...
Release/
...
proj2/
Development/
code/
python/
pack2/
__init__.py
mod2.py
I don't see how I can satisfy the requirements the python interpreter has to a folder structure and the ones that I have at the same time. Maybe I could create an extra folder structure with some symbolic links and use that in sys.path, but looking at the effort I'm already making, I have a feeling that there is something fundamentally wrong with my entire approach. On a sidenote, I also have a hard time believing that python really restricts me in my choice of source code folder names as it seems to do in the case depicted.
How can I set up my project folders and sys.path so I can import from all projects in a consistent manner if there are project and packages with identical names ?
This is the solution to my problem, albeit it might not be obvious at first.
In my projects, I have now introduced a convention of one namespace per customer. In every customer folder (cust1, cust2, etc.), there is an __init__.py file with this code:
import pkgutil
__path__ = pkgutil.extend_path(__path__, __name__)
All the other __init__.py files in my packages are empty (mostly because I haven't had the time yet to find out what else to do with them).
As explained here, extend_path makes sure Python is aware there is more than one sub-package within a package, physically located elsewhere and - from what I understand - the interpreter then does not stop searching after it fails to find a module under the first package path it encounters in sys.path, but searches all paths in __path__.
I can now access all code in a consistent manner criss-cross between all projects, e.g.
from cust1.proj1.pack1.mod1 import something
from cust3.proj4.pack1.mod4 import something_else
from cust3.proj1.pack4.mod4 import yet_something_else
On a downside, I had to create an even deeper project folder structure:
/path/
to/
projects/
cust1/
proj1/
Development/
code/
python/
cust1/
__init__.py <--- contains code as described above
proj1/
__init__.py <--- empty
pack1/
__init__.py <--- empty
mod1.py
but that seems very acceptable to me, especially considering how little effort I need to make to maintain this convention. sys.path is extended by /path/to/projects/cust1/proj1/Development/code/python for this project.
On a sidenote, I noticed that of all the __init__.py files for the same customer, the one in the path that appears first in sys.path is executed, no matter from which project I import something.
You should be using the excellent virtualenv and virtualenvwrapper tools.
What happens if you accidentally import code from one customer/project in another and don't notice? When you deliver it will almost certainly fail. I would adopt a convention of having PYTHONPATH set up for one project at a time, and not try to have everything you've ever written be importable at once.
You can use a wrapper script per-project to set PYTHONPATH and start python, or use scripts to switch environments when you switch projects.
Of course some projects well have dependencies on other projects (those libraries you mentioned), but if you intend for the customer to be able to import several projects at once then you have to arrange for the names to not clash. You can only have this problem when you have multiple projects on the PYTHONPATH that aren't supposed to be used together.
I use __init__.py in my project with the following structure :
project\
project.py
cfg.py
__init__.py
database\
data.py
__init__.py
test\
test_project.py
__init__.py
All is OK when I need to see database\ modules in project.py with
from database.data import *
But if I need to have some test code inside the test_project.py, how to 'see' the database\ modules ?
You have 3 options:
use relative imports (from .. import database.data). I wouldn't recommend that one.
append paths to sys.path in your code.
use addsitedir() and .pth files. Here is how.
Relative imports.
from .. import database.data
If you run a script from the directory that contains project\, you can simply do from project.database.data import *, in test_project.py.
This is generally a good idea, because relative imports are officially discouraged:
Relative imports for intra-package
imports are highly discouraged. Always
use the absolute package path for all
imports. Even now that PEP 328 [7] is
fully implemented in Python 2.5, its
style of explicit relative imports is
actively discouraged; absolute imports
are more portable and usually more
readable.
Absolute imports like the one given above are encouraged.