I am developing several Python projects for several customers at the same time. A simplified version of my project folder structure looks something like this:
/path/
to/
projects/
cust1/
proj1/
pack1/
__init__.py
mod1.py
proj2/
pack2/
__init__.py
mod2.py
cust2/
proj3/
pack3/
__init__.py
mod3.py
When I for example want to use functionality from proj1, I extend sys.path by /path/to/projects/cust1/proj1 (e.g. by setting PYTHONPATH or adding a .pth file to the site_packages folder or even modifying sys.path directly) and then import the module like this:
>>> from pack1.mod1 import something
As I work on more projects, it happens that different projects have identical package names:
/path/
to/
projects/
cust3/
proj4/
pack1/ <-- same package name as in cust1/proj1 above
__init__.py
mod4.py
If I now simply extend sys.path by /path/to/projects/cust3/proj4, I still can import from proj1, but not from proj4:
>>> from pack1.mod1 import something
>>> from pack1.mod4 import something_else
ImportError: No module named mod4
I think the reason why the second import fails is that Python only searches the first folder in sys.path where it finds a pack1 package and gives up if it does not find the mod4 module in there. I've asked about this in an earlier question, see import python modules with the same name, but the internal details are still unclear to me.
Anyway, the obvious solution is to add another layer of namespace qualification by turning project directories into super packages: Add __init__.py files to each proj* folder and remove these folders from the lines by which sys.path is extended, e.g.
$ export PYTHONPATH=/path/to/projects/cust1:/path/to/projects/cust3
$ touch /path/to/projects/cust1/proj1/__init__.py
$ touch /path/to/projects/cust3/proj4/__init__.py
$ python
>>> from proj1.pack1.mod1 import something
>>> from proj4.pack1.mod4 import something_else
Now I am running into a situation where different projects for different customers have the same name, e.g.
/path/
to/
projects/
cust3/
proj1/ <-- same project name as for cust1 above
__init__.py
pack4/
__init__.py
mod4.py
Trying to import from mod4 does not work anymore for the same reason as before:
>>> from proj1.pack4.mod4 import yet_something_else
ImportError: No module named pack4.mod4
Following the same approach that solved this problem before, I would add yet another package / namespace layer and turn customer folders into super super packages.
However, this clashes with other requirements I have to my project folder structure, e.g.
Development / Release structure to maintain several code lines
other kinds of source code like e.g. JavaScript, SQL, etc.
other files than source files like e.g. documents or data.
A less simplified, more real-world depiction of some project folders looks like this:
/path/
to/
projects/
cust1/
proj1/
Development/
code/
javascript/
...
python/
pack1/
__init__.py
mod1.py
doc/
...
Release/
...
proj2/
Development/
code/
python/
pack2/
__init__.py
mod2.py
I don't see how I can satisfy the requirements the python interpreter has to a folder structure and the ones that I have at the same time. Maybe I could create an extra folder structure with some symbolic links and use that in sys.path, but looking at the effort I'm already making, I have a feeling that there is something fundamentally wrong with my entire approach. On a sidenote, I also have a hard time believing that python really restricts me in my choice of source code folder names as it seems to do in the case depicted.
How can I set up my project folders and sys.path so I can import from all projects in a consistent manner if there are project and packages with identical names ?
This is the solution to my problem, albeit it might not be obvious at first.
In my projects, I have now introduced a convention of one namespace per customer. In every customer folder (cust1, cust2, etc.), there is an __init__.py file with this code:
import pkgutil
__path__ = pkgutil.extend_path(__path__, __name__)
All the other __init__.py files in my packages are empty (mostly because I haven't had the time yet to find out what else to do with them).
As explained here, extend_path makes sure Python is aware there is more than one sub-package within a package, physically located elsewhere and - from what I understand - the interpreter then does not stop searching after it fails to find a module under the first package path it encounters in sys.path, but searches all paths in __path__.
I can now access all code in a consistent manner criss-cross between all projects, e.g.
from cust1.proj1.pack1.mod1 import something
from cust3.proj4.pack1.mod4 import something_else
from cust3.proj1.pack4.mod4 import yet_something_else
On a downside, I had to create an even deeper project folder structure:
/path/
to/
projects/
cust1/
proj1/
Development/
code/
python/
cust1/
__init__.py <--- contains code as described above
proj1/
__init__.py <--- empty
pack1/
__init__.py <--- empty
mod1.py
but that seems very acceptable to me, especially considering how little effort I need to make to maintain this convention. sys.path is extended by /path/to/projects/cust1/proj1/Development/code/python for this project.
On a sidenote, I noticed that of all the __init__.py files for the same customer, the one in the path that appears first in sys.path is executed, no matter from which project I import something.
You should be using the excellent virtualenv and virtualenvwrapper tools.
What happens if you accidentally import code from one customer/project in another and don't notice? When you deliver it will almost certainly fail. I would adopt a convention of having PYTHONPATH set up for one project at a time, and not try to have everything you've ever written be importable at once.
You can use a wrapper script per-project to set PYTHONPATH and start python, or use scripts to switch environments when you switch projects.
Of course some projects well have dependencies on other projects (those libraries you mentioned), but if you intend for the customer to be able to import several projects at once then you have to arrange for the names to not clash. You can only have this problem when you have multiple projects on the PYTHONPATH that aren't supposed to be used together.
Related
Consider the following Python project skeleton:
proj/
├── foo
│ └── __init__.py
├── README.md
└── scripts
└── run.py
In this case foo holds the main project files, for example
# foo/__init__.py
class Foo():
def run(self):
print('Running...')
And scripts holds auxiliary scripts that need to import files from foo, which are then invoked via:
[~/proj]$ python scripts/run.py
There are two ways of importing Foo which both fail:
If a relative import is attempted from ..foo import Foo then the error is ValueError: attempted relative import beyond top-level package
If an absolute import is attempted from foo import Foo then the error is ModuleNotFoundError: No module named 'foo'
My current workaround is to append the running path to sys.path:
import sys
sys.path.append('.')
from foo import Foo
Foo().run()
But this feels like a hack, and has to be added to every new script in scripts/.
Is there a better way to structure scripts in such projects?
There's two ways you could resolve this.
(1) Turn your project into an installable package
Add a proj/setup.py file with the following contents:
import setuptools
setuptools.setup(
name="my-project",
version="1.0.0",
author="You",
author_email="you#example.com",
description="This is my project",
packages=["foo"],
)
create a virtualenv:
python3 -m venv virtualenv # this creates a directory "virtualenv" in your project
source ./virtualenv/bin/activate # this switches you into the new environment
python setup.py develop # this places your "foo" package in the environment
inside the virtualenv, foo behaves as an installed package and is importable via import foo.
So you can use absolute imports in your scripts.
To make them run from anywhere, without needing to activate the virtualenv, you can then specify the path as a shebang.
In scripts/run.py (the first line is important):
#!/path/to/proj/virtualenv/bin/python
import foo
print(foo.callfunc())
(2) Make the scripts part of the foo package
Instead of a separate subdirectory scripts, make a subpackage. In proj/foo/commands/run.py:
from .. import callfunc()
def main():
print(callfunc())
if __name__ == "__main__":
main()
Then execute the script from the top-level proj/ directory with:
python -m foo.commands.run
If you combine this with (1) and install your package, you can then run python -m foo.commands.run from anywhere.
Solution
There are multiple ways to achieve this. Both require creating a python package by adding a setup.py (building on #matejcik's answer).
Option 1 (recommended): entry_point + console_scripts register a function in your project as the entry point to script execution (ie: proj:foo:cli:run).
Option 2: scripts: Use this keyword argument in the setup() method to reference the path to your script (ie: `bin/script.py).
Note
I recommend using a CLI library/framework like Click so that your codebase is only concerned with maintaining application specific business logic rather than CLI robust framework feature logic. Also, click recommends using entry_point + console_scripts method of script integration due to cross-platform compatibility.
Setup Tools - Automatic script creation: https://setuptools.readthedocs.io/en/latest/setuptools.html#automatic-script-creation
Setup Tools - keyword arguments: https://setuptools.readthedocs.io/en/latest/setuptools.html#new-and-changed-setup-keywords
Click GitHub: https://github.com/pallets/click/
Click Setuptools integration: https://click.palletsprojects.com/en/master/setuptools/
Best practice? Put a single entry-point in the root
I know this might sound absurd, if you have lots of scripts you want to be able to execute... But it's actually the cleanest option and it's the one that is most often used in big Python projects like magage.py in Django, for example. It also doesn't need to be a huge undertaking. Even more importantly, it is always more secure to have a single entry point than several smaller ones.
proj/
├── run.py
├── foo
│ └── __init__.py
├── README.md
└── scripts
└── my_script.py
When run.py lives in the root directory, it can be very lightweight... Basically just a wrapper to call the function you need from my_scripts.py. It just ties everything together so now all of your imports just work.
Just keep in mind that your entrypoint is your root. The parent of a root doesn't exist. So put your entrypoint in the root, and then import packages relative to the root, aka import foo from scripts.
But how do I call multiple scripts!?
If you need to be able to call multiple scripts, this is a good argument for... Well... arguments! Keep run.py as your single entrypoint/command, and leverage subcommands to pass functionality to the script you care about.
Reinventing the wheel?
Generally, frameworks have already done the architecture for you to add your own subcommands, such as Django and, for a smaller footprint, Flask.
You can easily wrap up a small project without that help, though, as I've illustrated.
Security
No one ever wishes their code was less refactorable after a few years of working with it. No one ever wishes their codebase has less security. As we drive to more secure systems in general, it would make sense to create some gatekeeper script that determines what is and isn’t a safe operation and by whom. Moving the code to an LDAP based system, and need to lock things down by group? No problem. You can either change the single file or add LDAP security in your codebase, even creating your own internal API.
With distributed scripts, security options are much less flexible and much harder to maintain, and a single vulnerability could leave you wide open to exploit.
Bonus advantage
You're adding abstraction to your script base. If you ever want to change the structure of your codebase (maybe you want scripts to have subfolders with more organization), you/your users don't need to do any refactoring for any dependencies, or change paths to longer, more verbose names. Your package is self-contained, and the only thing a user will ever need to touch is your proj/run.py entry-point.
And, obviously, you don't need to play with Python paths as much!
You need to add __init__.py files to scripts and to proj folders for those to be considered Python packages and for you to be able to import from those.
One way this is also commonly done, is to place your foo and scripts folders into a proj/src folder, which then has a __init__.py file, and thus is a Python package.
If you like simplicity, and there are no additional restrictions on what you asked, add one __init__.py to the scripts folder, and to any other sibling folders, making them packages, then always use the absolute import form, as you said you do not want proj as a parent package of those and so there is no __init__.py there, and then call your scripts (instead) from inside the proj folder with:
python -m scripts.run
or whatever name you give to other scripts other than run.py
This is similar to option 2 of #matejcik answer, but even simpler.
another solution is you add a.pth file on your Python directory
and write the content of the following,
# your.pth
#↓ input the directory of proj
C:\...\proj
done
# scripts.py
from foo import Foo
Foo().run()
It will work well.
.. note:: If your IDE is PyCharm, then you can use the Source roots to help you too.
Python looks for packages/modules in the directories listed in sys.path. There are several ways of ensuring that your directories of interest, in this case proj, is one of those directories:
Move your scripts to the proj directory. Python adds the directory containing the input script to sys.path.
Put the directory proj into the contents of the PYTHONPATH environment variable.
Make the module part of an installable package and install it, either in a virtual environment or not.
At run time, dynamically add the directory proj to sys.path.
Option 1 is the most logical and requires no source changes. If you are afraid that might break something, you can perhaps make scripts a symbolic link pointing back to proj?
If you are unwilling to do that, then ...
You may consider it a hack, but I would recommend that you do modify your scripts to update sys.path at runtime. But instead append an absolute path so that the scripts can be executed regardless of what the current directory is. In your case, directory proj is the parent directory of directory scripts, where the scripts reside, so:
import sys
import os.path
parent_directory = os.path.split(os.path.dirname(__file__))[0]
if parent_directory not in sys.path:
#sys.path.insert(0, parent_directory) # the first entry is directory of the running script, so maybe insert after that at index 1
sys.append(parent_directory)
I want to ask you something that came to my mind doing some stuff.
I have the following structure:
src
- __init__.py
- class1.py
+ folder2
- __init__.py
- class2.py
I class2.py I want to import class1 to use it. Obviously, I cannot use
from src.class1 import Class1
cause it will produce an error. A workaround that works to me is to define the following in the __init__.py inside folder2:
import sys
sys.path.append('src')
My question is if this option is valid and a good idea to use or maybe there are better solutions.
Another question. Imagine that the project structure is:
src
- __init__.py
- class1.py
+ folder2
- __init__.py
- class2.py
+ errorsFolder
- __init__.py
- errors.py
In class1:
from errorsFolder.errors import Errors
this works fine. But if I try to do in class2 which is at the same level than errorsFolder:
from src.errorsFolder.errors import Errors
It fails (ImportError: No module named src.errorsFolder.errors)
Thank you in advance!
Despite the fact that it's slightly shocking to have to import a "parent" module in your package, your workaround depends on the current directory you're running your application, which is bad.
import sys
sys.path.append('src')
should be
import sys,os
sys.path.append(os.path.join(os.path.dirname(os.path.abspath(__file__)),os.pardir))
to add the parent directory of the directory of your current module, regardless of the current directory you're running your application from (your module may be imported by several applications, which don't all require to be run in the same directory)
One correct way to solve this is to set the environment variable PYTHONPATH to the path which contains src. Then import src.class1 will always work, regardless of which directory you start in.
from ..class1 import Class1 should work (at least it does here in a similar layout, using python 2.7.x).
As a general rule: messing with sys.path is usually a very bad idea, specially if this allows a same module to be imported from two different paths (which would be the case with your files layout).
Also, you may want to think twice about your current layout. Python is not Java and doesn't require (nor even encourage) a "one class per module" approach. If both classes need to work together, they might be better in a same module, or at least in modules at the same level in the package tree (note that you can use the top-level package's __init__ as a facade to provide direct access to objects defined in submodules / subpackages). NB : I'm not saying that your current layout is necessarily wrong, just that it might not be the simplest one.
There is also a solution that does not involve the use of the environment variable PYTHONPATH.
In src/ create setup.py with these contents:
from setuptools import setup
setup()
Now you can do pip install -e /path/to/src and import to your hearts content.
-e, --editable <path/url> Install a project in editable mode (i.e. setuptools "develop mode") from a local project path or a VCS url.
No, Its not good. Python takes modules in two ways:
Python looks for its modules and packages in $PYTHONPATH.
Refer: https://docs.python.org/2/using/cmdline.html#envvar-PYTHONPATH
To find out what is included in $PYTHONPATH, run the following code in python (3):
import sys
print(sys.path)
all folder containing init.py are marked as python package.(if they are subdirectories under PYTHONPATH)
By help of these two ways you can full-fill the need to create a python project.
I have a project with the following file structure:
root/
run.py
bot/
__init__.py
my_discord_bot.py
dice/
__init__.py
dice.py
# dice files
help/
__init__.py
help.py
# help files
parser/
__init__.py
parser.py
# other parser files
The program is run from within the root directory by calling python run.py. run.py imports bot.my_discord_bot and then makes use of a class defined there.
The file bot/my_discord_bot.py has the following import statements:
import dice.dice as d
import help.help as h
import parser.parser as p
On Linux, all three import statements execute correctly. On Windows, the first two seem to execute fine, but then on the third I'm told:
ImportError: No module named 'parser.parser'; 'parser' is not a package
Why does it break on the third import statement, and why does it only break on Windows?
Edit: clarifies how the program is run
Make sure that your parser is not shadowing a built-in or third-party package/module/library.
I am not 100% sure about the specifics of how this name conflict would be resolved, but it seems like you can potentially a). have your module overridden by the existing module (which seems like it might be happening in your Windows case), or b). override the existing module, which could cause bugs down the road. It seems like b is what commonly trips people up.
If you think this might be happening with one of your modules (which seems fairly likely with a name like parser), try renaming your module.
See this very nice article for more details and more common Python "import traps".
Put run.py outside root folder, so you'll have run.py next to root folder, then create __init__.py inside root folder, and change imports to:
import root.parser.parser as p
Or just rename your parser module.
Anyway you should be careful with naming, because you can simply mess your own stuff someday.
There're a lot of threads on importing modules from sibling directories, and majority recommends to either simply add init.py to source tree, or modify sys.path from inside those init files.
Suppose I have following project structure:
project_root/
__init__.py
wrappers/
__init__.py
wrapper1.py
wrapper2.py
samples/
__init__.py
sample1.py
sample2.py
All init.py files contain code which inserts absolute path to project_root/ directory into the sys.path. I get "No module names x", no matter how I'm trying to import wrapperX modules into sampleX. And when I try to print sys.path from sampleX, it appears that it does not contain path to project_root.
So how do I use init.py correctly to set up project environment variables?
Do not run sampleX.py directly, execute as module instead:
# (in project root directory)
python -m samples.sample1
This way you do not need to fiddle with sys.path at all (which is generally discouraged). It also makes it much easier to use the samples/ package as a library later on.
Oh, and init.py is not run because it only gets run/imported (which is more or less the same thing) if you import the samples package, not if you run an individual file as script.
I am just starting with python and have troubles understanding the searching path for intra-package module loads. I have a structure like this:
top/ Top-level package
__init__.py Initialize the top package
src/ Subpackage for source files
__init__.py
pkg1/ Source subpackage 1
__init__.py
mod1_1.py
mod1_2.py
...
pkg2/ Source subpackage 2
__init__.py
mod2_1.py
mod2_2.py
...
...
test/ Subpackage for unit testing
__init__.py
pkg1Test/ Tests for subpackage1
__init__.py
testSuite1_1.py
testSuite1_2.py
...
pkg2Test/ Tests for subpackage2
__init__.py
testSuite2_1.py
testSuite2_2.py
...
...
In testSuite1_1 I need to import module mod1_1.py (and so on). What import statement should I use?
Python's official tutorial (at docs.python.org, sec 6.4.2) says:
"If the imported module is not found in the current package (the package of which the current module is a submodule), the import statement looks for a top-level module with the given name."
I took this to mean that I could use (from within testSuite1_1.py):
from src.pkg1 import mod1_1
or
import src.pkg1.mod1_1
neither works. I read several answers to similar questions here, but could not find a solution.
Edit: I changed the module names to follow Python's naming conventions. But I still cannot get this simple example to work.
The module name doesn't include the .py extension. Also, in your example, the top-level module is actually named top. And finally, hyphens aren't legal for names in python, I'd suggest replacing them with underscores. Then try:
from top.src.pkg1 import mod1_1
Problem solved with the help of http://legacy.python.org/doc/essays/packages.html (referred to in a similar question). The key point (perhpas obvious to more experienced python developers) is this:
"In order for a Python program to use a package, the package must be findable by the import statement. In other words, the package must be a subdirectory of a directory that is on sys.path. [...] the easiest way to ensure that a package was on sys.path was to either install it in the standard library or to have users extend sys.path by setting their $PYTHONPATH shell environment variable"
Adding the path to "top" to PYTHONPATH solved the problem.To make the solution portable (this is a personal project, but I need to share it across several machines), I guess having a minimal initialization code in top/setup.py should work.