I have a directory struture like that:
project
| __init__.py
| project.py
| src/
| __init__.py
| class_one.py
| class_two.py
| test/
| __init__.py
| test_class_one.py
Which project.py just instantiate ClassOne and run it.
My problem is in the tests, I don't know how to import src classes. I've tried importing these ways and I got nothing:
from project.src.class_one import ClassOne
and
from ..src.class_one import ClassOne
What am I doing wrong? Is there a better directory structure?
----- EDIT -----
I changed my dir structure, it's like this now:
Project/
| project.py
| project/
| __init__.py
| class_one.py
| class_two.py
| test/
| __init__.py
| test_class_one.py
And in the test_class_one.py file I'm trying to import this way:
from project.class_one import ClassOne
And it still doesn't work. I'm not using the executable project.py inside a bin dir exactly because I can't import a package from a higher level dir. :(
Thanks. =D
It all depends on your python path. The easiest way to achieve what you're wanting to do here is to set the PYTHONPATH environment variable to where the "project" directory resides. For example, if your source is living in:
/Users/crocidb/src/project/
I would:
export PYTHONPATH=/Users/crocidb/src
and then in the test_one.py I could:
import project.src.class_one
Actually I would probably do it this way:
export PYTHONPATH=/Users/crocidb/src/project
and then this in test_one.py:
import src.class_one
but that's just my preference and really depends on what the rest of your hierarchy is. Also note that if you already have something in PYTHONPATH you'll want to add to it:
export PYTHONPATH=/Users/crocidb/src/project:$PYTHONPATH
or in the other order if you want your project path to be searched last.
This all applies to windows, too, except you would need to use windows' syntax to set the environment variables.
From Jp Calderone's excellent blog post:
Do:
name the directory something related to your project. For example, if your
project is named "Twisted", name the
top-level directory for its source
files Twisted. When you do releases,
you should include a version number
suffix: Twisted-2.5.
create a directory Twisted/bin and put your executables there, if you
have any. Don't give them a .py
extension, even if they are Python
source files. Don't put any code in
them except an import of and call to a
main function defined somewhere else
in your projects. (Slight wrinkle:
since on Windows, the interpreter is
selected by the file extension, your
Windows users actually do want the .py
extension. So, when you package for
Windows, you may want to add it.
Unfortunately there's no easy
distutils trick that I know of to
automate this process. Considering
that on POSIX the .py extension is a
only a wart, whereas on Windows the
lack is an actual bug, if your
userbase includes Windows users, you
may want to opt to just have the .py
extension everywhere.)
If your project is expressable as a single Python source file, then put it
into the directory and name it
something related to your project. For
example, Twisted/twisted.py. If you
need multiple source files, create a
package instead (Twisted/twisted/,
with an empty
Twisted/twisted/__init__.py) and place
your source files in it. For example,
Twisted/twisted/internet.py.
put your unit tests in a sub-package of your package (note - this means
that the single Python source file
option above was a trick - you always
need at least one other file for your
unit tests). For example,
Twisted/twisted/test/. Of course, make
it a package with
Twisted/twisted/test/__init__.py.
Place tests in files like
Twisted/twisted/test/test_internet.py.
add Twisted/README and Twisted/setup.py to explain and
install your software, respectively,
if you're feeling nice.
Don't:
put your source in a directory called src or lib. This makes it hard
to run without installing.
put your tests outside of your Python package. This makes it hard to
run the tests against an installed
version.
create a package that only has a __init__.py and then put all your code into __init__.py. Just make a module
instead of a package, it's simpler.
try to come up with magical hacks to make Python able to import your module
or package without having the user add
the directory containing it to their
import path (either via PYTHONPATH or
some other mechanism). You will not
correctly handle all cases and users
will get angry at you when your
software doesn't work in their
environment.
Related
Consider the following Python project skeleton:
proj/
├── foo
│ └── __init__.py
├── README.md
└── scripts
└── run.py
In this case foo holds the main project files, for example
# foo/__init__.py
class Foo():
def run(self):
print('Running...')
And scripts holds auxiliary scripts that need to import files from foo, which are then invoked via:
[~/proj]$ python scripts/run.py
There are two ways of importing Foo which both fail:
If a relative import is attempted from ..foo import Foo then the error is ValueError: attempted relative import beyond top-level package
If an absolute import is attempted from foo import Foo then the error is ModuleNotFoundError: No module named 'foo'
My current workaround is to append the running path to sys.path:
import sys
sys.path.append('.')
from foo import Foo
Foo().run()
But this feels like a hack, and has to be added to every new script in scripts/.
Is there a better way to structure scripts in such projects?
There's two ways you could resolve this.
(1) Turn your project into an installable package
Add a proj/setup.py file with the following contents:
import setuptools
setuptools.setup(
name="my-project",
version="1.0.0",
author="You",
author_email="you#example.com",
description="This is my project",
packages=["foo"],
)
create a virtualenv:
python3 -m venv virtualenv # this creates a directory "virtualenv" in your project
source ./virtualenv/bin/activate # this switches you into the new environment
python setup.py develop # this places your "foo" package in the environment
inside the virtualenv, foo behaves as an installed package and is importable via import foo.
So you can use absolute imports in your scripts.
To make them run from anywhere, without needing to activate the virtualenv, you can then specify the path as a shebang.
In scripts/run.py (the first line is important):
#!/path/to/proj/virtualenv/bin/python
import foo
print(foo.callfunc())
(2) Make the scripts part of the foo package
Instead of a separate subdirectory scripts, make a subpackage. In proj/foo/commands/run.py:
from .. import callfunc()
def main():
print(callfunc())
if __name__ == "__main__":
main()
Then execute the script from the top-level proj/ directory with:
python -m foo.commands.run
If you combine this with (1) and install your package, you can then run python -m foo.commands.run from anywhere.
Solution
There are multiple ways to achieve this. Both require creating a python package by adding a setup.py (building on #matejcik's answer).
Option 1 (recommended): entry_point + console_scripts register a function in your project as the entry point to script execution (ie: proj:foo:cli:run).
Option 2: scripts: Use this keyword argument in the setup() method to reference the path to your script (ie: `bin/script.py).
Note
I recommend using a CLI library/framework like Click so that your codebase is only concerned with maintaining application specific business logic rather than CLI robust framework feature logic. Also, click recommends using entry_point + console_scripts method of script integration due to cross-platform compatibility.
Setup Tools - Automatic script creation: https://setuptools.readthedocs.io/en/latest/setuptools.html#automatic-script-creation
Setup Tools - keyword arguments: https://setuptools.readthedocs.io/en/latest/setuptools.html#new-and-changed-setup-keywords
Click GitHub: https://github.com/pallets/click/
Click Setuptools integration: https://click.palletsprojects.com/en/master/setuptools/
Best practice? Put a single entry-point in the root
I know this might sound absurd, if you have lots of scripts you want to be able to execute... But it's actually the cleanest option and it's the one that is most often used in big Python projects like magage.py in Django, for example. It also doesn't need to be a huge undertaking. Even more importantly, it is always more secure to have a single entry point than several smaller ones.
proj/
├── run.py
├── foo
│ └── __init__.py
├── README.md
└── scripts
└── my_script.py
When run.py lives in the root directory, it can be very lightweight... Basically just a wrapper to call the function you need from my_scripts.py. It just ties everything together so now all of your imports just work.
Just keep in mind that your entrypoint is your root. The parent of a root doesn't exist. So put your entrypoint in the root, and then import packages relative to the root, aka import foo from scripts.
But how do I call multiple scripts!?
If you need to be able to call multiple scripts, this is a good argument for... Well... arguments! Keep run.py as your single entrypoint/command, and leverage subcommands to pass functionality to the script you care about.
Reinventing the wheel?
Generally, frameworks have already done the architecture for you to add your own subcommands, such as Django and, for a smaller footprint, Flask.
You can easily wrap up a small project without that help, though, as I've illustrated.
Security
No one ever wishes their code was less refactorable after a few years of working with it. No one ever wishes their codebase has less security. As we drive to more secure systems in general, it would make sense to create some gatekeeper script that determines what is and isn’t a safe operation and by whom. Moving the code to an LDAP based system, and need to lock things down by group? No problem. You can either change the single file or add LDAP security in your codebase, even creating your own internal API.
With distributed scripts, security options are much less flexible and much harder to maintain, and a single vulnerability could leave you wide open to exploit.
Bonus advantage
You're adding abstraction to your script base. If you ever want to change the structure of your codebase (maybe you want scripts to have subfolders with more organization), you/your users don't need to do any refactoring for any dependencies, or change paths to longer, more verbose names. Your package is self-contained, and the only thing a user will ever need to touch is your proj/run.py entry-point.
And, obviously, you don't need to play with Python paths as much!
You need to add __init__.py files to scripts and to proj folders for those to be considered Python packages and for you to be able to import from those.
One way this is also commonly done, is to place your foo and scripts folders into a proj/src folder, which then has a __init__.py file, and thus is a Python package.
If you like simplicity, and there are no additional restrictions on what you asked, add one __init__.py to the scripts folder, and to any other sibling folders, making them packages, then always use the absolute import form, as you said you do not want proj as a parent package of those and so there is no __init__.py there, and then call your scripts (instead) from inside the proj folder with:
python -m scripts.run
or whatever name you give to other scripts other than run.py
This is similar to option 2 of #matejcik answer, but even simpler.
another solution is you add a.pth file on your Python directory
and write the content of the following,
# your.pth
#↓ input the directory of proj
C:\...\proj
done
# scripts.py
from foo import Foo
Foo().run()
It will work well.
.. note:: If your IDE is PyCharm, then you can use the Source roots to help you too.
Python looks for packages/modules in the directories listed in sys.path. There are several ways of ensuring that your directories of interest, in this case proj, is one of those directories:
Move your scripts to the proj directory. Python adds the directory containing the input script to sys.path.
Put the directory proj into the contents of the PYTHONPATH environment variable.
Make the module part of an installable package and install it, either in a virtual environment or not.
At run time, dynamically add the directory proj to sys.path.
Option 1 is the most logical and requires no source changes. If you are afraid that might break something, you can perhaps make scripts a symbolic link pointing back to proj?
If you are unwilling to do that, then ...
You may consider it a hack, but I would recommend that you do modify your scripts to update sys.path at runtime. But instead append an absolute path so that the scripts can be executed regardless of what the current directory is. In your case, directory proj is the parent directory of directory scripts, where the scripts reside, so:
import sys
import os.path
parent_directory = os.path.split(os.path.dirname(__file__))[0]
if parent_directory not in sys.path:
#sys.path.insert(0, parent_directory) # the first entry is directory of the running script, so maybe insert after that at index 1
sys.append(parent_directory)
I have see this in python tutorial:
The __init__.py files are required to make Python treat the directories as containing packages;
I make the directory hierarchy in pycharm like this, where the subdir1 doesn't contain __init__.py and subdir2 contains a __init__.py file.
First, I add Directory into system.pyth.
I write a hello function in hello1.py and hello2.py respectively.
Then I call hello func in test files like this:
# test1.py
from subdir1 import hello1
hello1.hello()
# test2.py
from subdir2 import hello2
hello2.hello()
They all succeed. It seems that the __init__.py does not necessary for import modules from different directories, right?
So, I want to know in what situation a __init__.py is required. Thanks for your answering!
Python 3.3+ has implicit namespace packages that allow you to create packages without __init__.py. In Python 2, __init__.py is the old method and still works.
Allowing implicit namespace packages means that the requirement to provide an __init__.py file can be dropped completely, and affected portions can be installed into a common directory or split across multiple directories as distributions see fit.
Note: init.py files were used to mark directories on your disk as Python packages.
Useful links:
Implicit Namespace Packages
__init__.py
All the code in the Python interpreter will be lost when we exit the interpreter. But when writing large programs file is broken into multiple different files for ease of use, debugging and readability. In Python modules are used to achieve such goals. Modules are nothing but files with Python definitions and statements. The module name, to import, has the same name of the Python file without the .py extension.
So if you want the files under some directory to be a module and you want it to be a exportable module into some other piece of code then that category need to have the init.py
More details
For making Python treat the Directory as a package.
Python Doc
This should be the easiest problem on earth, but even after extensive searching and tinkering, I'm still in deep trouble with finding a "correct" way to lay a directory structure and manage to run pytest etc correctly.
Let's say my I have a program called apple.
|- README.md
|- apple
| |-- __init__.py
| |-- apple.py
| - tests
| |-- test_everything.py
The apple.py contains some functions, for examples sake let's call one eat(). And the test_everything.py file contains some tests like assert eat()=="foobar". So good so easy, but then the fun begins:
What about the __init__.py in the apple directory... correct? Empty or what should be inside?
Is it best practice to call py.test from the root directory? Or py.test tests?
So many projects have a __init__.py in their test directory, but that's explicitly said to be wrong in the py.test documentation. So why god why
What comes at the top of the test_everything.py file: an import apple or from apple import *? or something else entirely
Do you call the functions then by eat() or apple.eat()?
Some even recommend manipulating os.path.dirname in python
This should be easy, but I've seen every combination of the above, not even speaking about tox and the myriad of other tools. Yet with the slightest error, you get thrown some ImportError: No module named 'apple' or some other funky error.
What is the "right" way? The advice and the existing code on github etc follows extremely different conventions. For a medium-experienced coder, this should be much easier.
What about the __init__.py in the apple directory... correct? Empty or what should be inside?
Yes, correct. Most frequently empty. If you put foo = 42 in it you can later do from apple import foo while you'll need to do from apple.apple import foo if you put it in apple.py. While it might seem convenient you should use it sparingly.
Is it best practice to call py.test from the root directory? Or py.test tests?
py.test should be able to find your tests regardless, but see below..
So many projects have a __init__.py in their test directory, but that's explicitly said to be wrong in the py.test documentation. So why god why
So you can import a file in tests that provide common test functionality. In py.test that might be better achieved by creating fixtures in a file called tests/conftest.py.
What comes at the top of the test_everything.py file: an import apple or from apple import *? or something else entirely
from apple import apple
Do you call the functions then by eat() or apple.eat()?
apple.eat()
Some even recommend manipulating os.path.dirname in python
That seems very fragile. I would suggest either
(a) set the environment variable PYTHONPATH to point to the folder where README.md is, or better
(b) create a setup.py file (at the same level as your README.md file), here's a minimal one:
from setuptools import setup
setup(name='apple', packages=['apple'])
Run the file like so:
python setup.py develop
now apple is globally available and you should never see a no module named apple problem again, i.e. you can run py.test from the root folder or the tests folder.
You can read more about setup.py in the Python Packaging User Guide at https://packaging.python.org/
I have arranged an example that works here.
I think naming both package and module apple confusing, perhaps that was one source of confusion. The only non-obvious part IMO is that you must set PYTHONPATH to the current directory if you're not using a proper setup.py distutils package to install your apple package.
I am packaging a python package using distutils.
My structure looks like this:
src
\ __init__.py
| util.py
| client
\ __init__.py
| file1.py
| file2.py
In setup.py my package is:
package_dir={'pkg_name':'src'},
packages=['pkg_name','pkg_name.client']
I would like to be able to use the content of client without out having to import pkg_name.client, but just by importing pkg.
Is that possible? I just read https://docs.python.org/2/distutils/setupscript.html#listing-whole-packages. The examples I saw are still all referencing the directory name (or a different one). But is it possible to remove that, and just have it included in root? I've tried a bunch of variations but they are all failing.
Something like package_dir = {'': 'src', '': 'src/client'}
In python, the directory name that init will still need to be referenced but I am hoping distutil has some trick around that.
When I first packaged, I also tried a bunch of variations, will get it working.
If your directory structure can be changed, my suggestion is to use a tool I created to myself and made it public, called machete. There are others, like cookiecutter.
https://pypi.python.org/pypi/machete/
In a empty folder, you just type:
/home/you $ machete -t app
This will generate all the structure needed, and you just need to understand its logic, and fill it with code. The structure is very different from yours, because a package has some needs.
Creating a package for the first time is kind of frustrating.
Aditionally, one tricky thing machete does is to add this on modules folder:
file: __init__.py
# -*- coding: UTF-8 -*-
import glob
__all__ = []
py_files = glob.glob('./yourproject/modules/*.py')
py_files.remove('./yourproject/modules/__init__.py')
for py_file in py_files:
__all__.append(py_file.split('/')[3].split('.')[0])
This way, importing modules.*, perhaps you can achieve your goal with your directory structure. This one works for machete generated projects.
Hope one of my suggestions helps you out.
Marco
I am developing several Python projects for several customers at the same time. A simplified version of my project folder structure looks something like this:
/path/
to/
projects/
cust1/
proj1/
pack1/
__init__.py
mod1.py
proj2/
pack2/
__init__.py
mod2.py
cust2/
proj3/
pack3/
__init__.py
mod3.py
When I for example want to use functionality from proj1, I extend sys.path by /path/to/projects/cust1/proj1 (e.g. by setting PYTHONPATH or adding a .pth file to the site_packages folder or even modifying sys.path directly) and then import the module like this:
>>> from pack1.mod1 import something
As I work on more projects, it happens that different projects have identical package names:
/path/
to/
projects/
cust3/
proj4/
pack1/ <-- same package name as in cust1/proj1 above
__init__.py
mod4.py
If I now simply extend sys.path by /path/to/projects/cust3/proj4, I still can import from proj1, but not from proj4:
>>> from pack1.mod1 import something
>>> from pack1.mod4 import something_else
ImportError: No module named mod4
I think the reason why the second import fails is that Python only searches the first folder in sys.path where it finds a pack1 package and gives up if it does not find the mod4 module in there. I've asked about this in an earlier question, see import python modules with the same name, but the internal details are still unclear to me.
Anyway, the obvious solution is to add another layer of namespace qualification by turning project directories into super packages: Add __init__.py files to each proj* folder and remove these folders from the lines by which sys.path is extended, e.g.
$ export PYTHONPATH=/path/to/projects/cust1:/path/to/projects/cust3
$ touch /path/to/projects/cust1/proj1/__init__.py
$ touch /path/to/projects/cust3/proj4/__init__.py
$ python
>>> from proj1.pack1.mod1 import something
>>> from proj4.pack1.mod4 import something_else
Now I am running into a situation where different projects for different customers have the same name, e.g.
/path/
to/
projects/
cust3/
proj1/ <-- same project name as for cust1 above
__init__.py
pack4/
__init__.py
mod4.py
Trying to import from mod4 does not work anymore for the same reason as before:
>>> from proj1.pack4.mod4 import yet_something_else
ImportError: No module named pack4.mod4
Following the same approach that solved this problem before, I would add yet another package / namespace layer and turn customer folders into super super packages.
However, this clashes with other requirements I have to my project folder structure, e.g.
Development / Release structure to maintain several code lines
other kinds of source code like e.g. JavaScript, SQL, etc.
other files than source files like e.g. documents or data.
A less simplified, more real-world depiction of some project folders looks like this:
/path/
to/
projects/
cust1/
proj1/
Development/
code/
javascript/
...
python/
pack1/
__init__.py
mod1.py
doc/
...
Release/
...
proj2/
Development/
code/
python/
pack2/
__init__.py
mod2.py
I don't see how I can satisfy the requirements the python interpreter has to a folder structure and the ones that I have at the same time. Maybe I could create an extra folder structure with some symbolic links and use that in sys.path, but looking at the effort I'm already making, I have a feeling that there is something fundamentally wrong with my entire approach. On a sidenote, I also have a hard time believing that python really restricts me in my choice of source code folder names as it seems to do in the case depicted.
How can I set up my project folders and sys.path so I can import from all projects in a consistent manner if there are project and packages with identical names ?
This is the solution to my problem, albeit it might not be obvious at first.
In my projects, I have now introduced a convention of one namespace per customer. In every customer folder (cust1, cust2, etc.), there is an __init__.py file with this code:
import pkgutil
__path__ = pkgutil.extend_path(__path__, __name__)
All the other __init__.py files in my packages are empty (mostly because I haven't had the time yet to find out what else to do with them).
As explained here, extend_path makes sure Python is aware there is more than one sub-package within a package, physically located elsewhere and - from what I understand - the interpreter then does not stop searching after it fails to find a module under the first package path it encounters in sys.path, but searches all paths in __path__.
I can now access all code in a consistent manner criss-cross between all projects, e.g.
from cust1.proj1.pack1.mod1 import something
from cust3.proj4.pack1.mod4 import something_else
from cust3.proj1.pack4.mod4 import yet_something_else
On a downside, I had to create an even deeper project folder structure:
/path/
to/
projects/
cust1/
proj1/
Development/
code/
python/
cust1/
__init__.py <--- contains code as described above
proj1/
__init__.py <--- empty
pack1/
__init__.py <--- empty
mod1.py
but that seems very acceptable to me, especially considering how little effort I need to make to maintain this convention. sys.path is extended by /path/to/projects/cust1/proj1/Development/code/python for this project.
On a sidenote, I noticed that of all the __init__.py files for the same customer, the one in the path that appears first in sys.path is executed, no matter from which project I import something.
You should be using the excellent virtualenv and virtualenvwrapper tools.
What happens if you accidentally import code from one customer/project in another and don't notice? When you deliver it will almost certainly fail. I would adopt a convention of having PYTHONPATH set up for one project at a time, and not try to have everything you've ever written be importable at once.
You can use a wrapper script per-project to set PYTHONPATH and start python, or use scripts to switch environments when you switch projects.
Of course some projects well have dependencies on other projects (those libraries you mentioned), but if you intend for the customer to be able to import several projects at once then you have to arrange for the names to not clash. You can only have this problem when you have multiple projects on the PYTHONPATH that aren't supposed to be used together.