I'm trying out the one of the recommended python package layouts with the src directory.
My problem is when I run pytest from the command line my tests are not finding my python package. I tried running from the top level of this directory and within the tests directory but still getting ModuleNotFoundError exception. I'm running python 3.5 with pytest-3.5.0.
What is the recommend way to execute pytest from this type of python package layout?
If you're using py.test to run the tests, it won't find the application as you must have not installed it. You can use python -m pytest to run your tests, as python will auto-add your current working directory to your PATH.
Hence, using python -m pytest .. would work if you're doing it in the src directory
The structure that I normally use is:
root
├── packagename
│ ├── __init__.py
│ └── ...
└── tests
└── ...
This way when I simply run python -m pytest in the root directory it works fine. Read more at: https://stackoverflow.com/a/34140498/1755083
Second option, if you still want to run this with py.test you can use:
PYTHONPATH=src/ py.test
and run it from the root. This simply adds the src folder to your PYTHONPATH and now your tests will know where to find the packagename package.
The third option is to use pip with -e to install your package in the editable mode. This creates a link to the packagename folder and installs it with pip - so any modification done in packagename is immediately reflected in pip's installation (as it is a link). The issue with this is that you need to redo the installation if you move the repo and need to keep track of which one is installed if you have more than 1 clones/packages.
Your layout looks OK and is the today recommended.
But I assume your problem are the import in your test_*.py files. You shouldn't take care about from where to import your package in you unittests; just import them.
Install in Development Mode via --editable
How could this be done? You have to "install" your package in Developement Mode. Use the --editable option of pip. In that case not a real package is build and installed but only symlinks are used to expose your package folder (the source in developer version) to the operating system as it would be a real release package.
Now your unittest never need to care about where the package is installed and how to import it. Just import it because the package is known to the system.
Note: This is the today recommended way. Other "solutions" hacking around with manipulating sys.path or environment variables like PYTHONPATH are just from Python's early days and should be avoided today.
Just my two cents:
Don't use py.test, use pytest. The py.test is an old and deprecated command.
Notice Windows users; you may or may not have PYTHONPATH defined on your system.
In such a case, I used the following commands to run tests:
set PYTHONPATH=src
pytest
To examine the PYTHONPATH:
echo %PYTHONPATH%
src
And the directory structure on containing src folder:
setup.py
src
utils
algo.py
srel.py
__init__.py
tests
algo_test.py
srel_test.py
__init__.py
Finally, the documentation says that you might omit the __init__.py files. Removing them worked for me in this case.
I think the idea behind the src layout is to isolate the package code notably from the tests. The best way I see here is to install your package and test it as it will be delivered to future users.
Much more suitable for development. No need for hacks like import src or to modify the PYTHONPATH.
Create a virtual environment
Build your package
Install your package manually
Install your test dependencies (possibly different from the package dependencies)
Run pytest in the main directory
It works smoothly.
Related
I just transitioned from pipenv to poetry and I'm having trouble importing a package from a local package I'm developing in a few of my scripts. To make this more concrete, my project looks something like:
pyproject.toml
poetry.lock
bin/
myscript.py
mypackage/
__init__.py
lots_of_stuff.py
Within myscript.py, I import mypackage. But when I poetry run bin/myscript.py I get a ModuleNotFoundError because the PYTHONPATH does not include the root of this project. With pipenv, I could solve that by specifying PYTHONPATH=/path/to/project/root in a .env file, which would be automatically loaded at runtime. What is the right way to import local packages with poetry?
I ran across this piece on using environment variables but export POETRY_PYTHONPATH=/path/to/roject/root doesn't seem to help.
After quite a bit more googling, I stumbled on the packages attribute within the tool.poetry section for pyproject.toml files. To include local packages in distribution, you can specify
# pyproject.toml
[tool.poetry]
# ...
packages = [
{ include = "mypackage" },
]
Now these packages are installed in editable mode :)
Adding a local package (in development) to another project can be done as:
poetry add ./my-package/
poetry add ../my-package/dist/my-package-0.1.0.tar.gz
poetry add ../my-package/dist/my_package-0.1.0.whl
If you want the dependency to be installed in editable mode you can specify it in the pyproject.toml file. It means that changes in the local directory will be reflected directly in environment.
[tool.poetry.dependencies]
my-package = {path = "../my/path", develop = true}
With current preview release (1.2.0a) a command line option was introduced, to avoid above manually steps:
poetry add --editable /path/to/package
Another ways of adding packages can be found on poetry add page
If the above doesn't work, you can take a look over additional steps detailed in this discussion
For reference I've looked at the following links.
Python Imports, Paths, Directories & Modules
Importing modules from parent folder
Importing modules from parent folder
Python Imports, Paths, Directories & Modules
I understand that I'm doing is wrong and I'm trying to avoid relative path and changing things in via sys.path as much as possible, though if those are my only options, please help me come up with a solution.
Note, here is an example of my current working directory structure. I think I should add a little more context. I started off adding __init__.py to every directory so they would be considered packages and subpackages, but I'm not sure that is what I actually want.
myapp/
pack/
__init__.py
helper.py
runservice/
service1/
Dockerfile
service2/
install.py
Dockerfile
The only packages I will be calling exist in pack/ directory, so I believe that should be the only directory considered a package by python.
Next, the reason why this might get a little tricky, ultimately, this is just a service that builds various different containers. Where the entrypoints will live in python service*/install.py where I cd into the working directory of the script. The reason for this, I don't want container1 (service1) to know about the codebase in service2, as its irrelevant I would like and the code to be separated.
But, by running install.py, I need to be able to do: from pack.helper import function but clearly I am doing something wrong.
Can someone help me come up with a solution, so I can leave my entrypoint to my container as cd service2, python install.py.
Another important thing to note, within the script I have logic like:
if not os.path.isdir(os.path.expanduser(tmpDir))
I am hoping any solution we come up with, will not affect the logic here?
I apologize for the noob question.
EDIT:
Note, I I think I can do something like
sys.path.append(os.path.join(os.path.dirname(__file__), '..'))
But as far as I understand, that is bad practice....
Fundamentally what you've described is a supporting library that goes with a set of applications that run on top of it. They happen to be in the same repository (a "monorepo") but that's okay.
The first step is to take your library and package it up like a normal Python library would be. The Python Packaging User Guide has a section on Packaging and distributing projects, which is mostly relevant; though you're not especially interested in uploading the result to PyPI. You at the very least need the setup.py file described there.
With this reorganization you should be able to do something like
$ ls pack
pack/ setup.py
$ ls pack/pack
__init__.py helper.py
$ virtualenv vpy
$ . vpy/bin/activate
(vpy) $ pip install -e ./pack
The last two lines are important: in your development environment they create a Python virtual environment, an isolated set of packages, and then install your local library package into it. Still within that virtual environment, you can now run your scripts
(vpy) $ cd runservice/service2
(vpy) $ ./install.py
Your scripts do not need to modify sys.path; your library is installed in an "expected" place.
You can and should do live development in this environment. pip install -e makes the virtual environment's source code for whatever's in pack be your actual local source tree. If service2 happens to depend on other Python libraries, listing them out in a requirements.txt file is good practice.
Once you've migrated everything into the usual Python packaging scheme, it's straightforward to transplant this into Docker. The Docker image here plays much the same role as a Python virtual environment, in that it has an isolated Python installation and an isolated library tree. So a Dockerfile for this could more or less look like
FROM python:2.7
# Copy and install the library
WORKDIR /pack
COPY pack/ ./
RUN pip install .
# Now copy and install the application
WORKDIR /app
COPY runservice/service2/ ./
# RUN pip install -r requirements.txt
# Set standard metadata to run the application
CMD ["./install.py"]
That Dockerfile depends on being run from the root of your combined repository tree
sudo docker build -f runservice/service2/Dockerfile -t me/service2 .
A relevant advanced technique is to break this up into separate Docker images. One contains the base Python plus your installed library, and the per-application images build on top of that. This avoids reinstalling the library multiple times if you need to build all of the applications, but it also leads to a more complicated sequence with multiple docker build steps.
# pack/Dockerfile
FROM python2.7
WORKDIR /pack
COPY ./ ./
RUN pip install .
# runservice/service2/Dockerfile
FROM me/pack
WORKDIR /app
COPY runservice/service2/ ./
CMD ["./install.py"]
#!/bin/sh
set -e
(cd pack && docker build -t me/pack .)
(cd runservice/service2 && docker build -t me/service2 .)
I have a python application (Django based), and I have a couple of standalone maintenance scripts that go along with the application, that I have to call every now and then. They have to import parts of my application (sub-packages). Currently, I just put them in my toplevel directory:
application/
djangoproject/
djangoapp/
otherpackage/
tool1.py
tool2.py
Where tool1.py would do
from djangoproject import wsgi
from djangoapp.models import Poll
I've accumulated quite some of these tools, and would like to move them to a scripts subdirectory. Then, I would like to be able to call them via python scripts/tool1.py or maybe cd scripts; python tool1.py.
I understand (and sometimes lament) how Python's imports work, and I know that I can add some lines to each script to add the parent directory to PYTHONPATH. I am wondering if there is a widespread pattern to handle such a collection of assorted scripts. Maybe one could put the path manipulation into another file, and have every script start with import mainproject?
I am using a virtualenv, and installing dependencies with pip. But the application itself currently doesn't use a setup.py, and I think it wouldn't help to move the scripts to a separate package installed via pip, since I change them a lot during development, and there are lots of one-offs.
The ways for organizing the source code vary from project to project. From the years of my experience, the best and the most pythonic way is to always have setup.py.
In that case, you can make pip install -e . and the editable version from . dir will be pseudo-installed to the virtualenv. Actually, not really installed (i.e. copied), but "linked": the source code dir will be added to sys.path with .pth files, so you can edit & try without any special copying/installing steps afterward.
More on that, you can extend setup.py with extra dependencies for e.g. the development purposes, and install them by pip install -e .[dev]. More like a fancy consequence.
The rest depends on the nature of the scripts.
If the scripts are part of the application, they should be installed via the entry-points in setup.py.
# setup.py:
setup(
entry_points={
'console_scripts': [
'tool1 = mytools.tool1:main',
'tool2 = mytools.tool2:main',
],
},
)
In that case, after pip install -e ., they will be in the bin folder of the virtualenv, or in /usr/local/bin or alike if the system python is used. You can execute them like this:
source .venv/bin/activate
tool1 ...
# OR:
~/path/to/venv/bin/tool2
The scripts installed this way are fully aware of the virtualenv, to which they were installed, so no activation and no explicit python binary are needed.
If the scripts are for the code maintenance, and not semantically part of the application, then they are usually put into ./scripts/ directory (or any other, e.g. ./ci/), with shebang at the top (#!/usr/bin/env python). E.g., tool1.py:
#!/usr/bin/env python
def main():
pass
if __name__ == '__main__':
main()
And executed in the current virtualenv due to this shebang as follows:
source .venv/bin/activate
./scripts/tool1.py ...
# OR:
~/path/to/venv/bin/python ./scripts/tool1.py
Unlike the scripts installed via the entry points, these scripts do not know about their own virtualenv in any way, so the virtualenv should be activate or proper python used explicitly.
This way is also used when the scripts are non-python, e.g. for the bash scripts.
In both cases, the requirements.txt file is sometimes used to pin the application's & dependencies' versions (with pip freeze), so that the deployments would be persistent & predictable. But this is another story — about the deployment of the application, not about the packaging & maintenance.
The requirements.txt file is regenerated from time to time to satisfy the new unpinned (i.e. flexible) requirements in setup.py and the new package versions available. But usually it is the generated content (despite being committed in the repo), not the content maintained by hand.
If you strictly do not want to have setup.py for any reason, then either execute those scripts with the modified env var:
PYTHONPATH=. python scripts/tool1.py
Or hack the sys.path from inside:
# tools1.py
import sys
import os
sys.path.insert(0, os.path.dirname(os.path.dirname(__file__)))
This is exactly what pip install -e . does, just done manually on every call, not once with the .pth file in the virtualenv. And also this looks hacky.
However, as we know, neither hacky solutions nor the duplicating solutions, especially those duplicating the standard toolkit, are considered "pythonic".
I am experimenting with putting my Python code into the standard directory structure used for deployment with setup.py and maybe PyPI. for a Python library called mylib it would be something like this:
mylibsrc/
README.rst
setup.py
bin/
some_script.py
mylib/
__init.py__
foo.py
There's often also a test/ subdirectory but I haven't tried writing unit tests yet. The recommendation to have scripts in a bin/ subdirectory can be found in the official Python packaging documentation.
Of course, the scripts start with code that looks like this:
#!/usr/bin/env python
from mylib.foo import something
something("bar")
This works well when it eventually comes to deploying the script (e.g. to devpi) and then installing it with pip. But if I run the script directly from the source directory, as I would while developing new changes to the library/script, I get this error:
ImportError: No module named 'mylib'
This is true even if the current working directory is the root mylibsrc/ and I ran the script by typing ./bin/some_script.py. This is because Python starts searching for packages in the directory of the script being run (i.e. from bin/), not the current working directory.
What is a good, permament way to make it easy to run scripts while developing packages?
Here is a relevant other question (especially comments to the first answer).
The solutions for this that I've found so far fall into three categories, but none of them are ideal:
Manually fix up your Python's module search path somehow before running your scripts.
You can manually add mylibsrc to my PYTHONPATH environment variable. This seems to be the most official (Pythonic?) solution, but means that every time I check out a project I have to remember to manually change my environment before I can run any code in it.
Add . to the start of my PYTHONPATH environment variable. As I understand it this could have some security problems. This would actually be my favoured trick if I was the only person to use my code, but I'm not, and I don't want to ask others to do this.
While looking at answers on the internet, for files in a test/ directory I've seen recommendations that they all (indirectly) include a line of code sys.path.insert(0, os.path.abspath('..')) (e.g. in structuring your project). Yuck! This seems like a bearable hack for files that are only for testing, but not those that will be installed with the package.
Edit: I have since found an alternative, which turns out to be in this category: by running the scripts with Python's -m script, the search path starts in the working directory instead of the bin/ directory. See my answer below for more details.
Install the package to a virtual environment before using it, using a setup.py (either running it directly or using pip).
This seems like overkill if I'm just testing a change that I'm not sure is even syntactically correct yet. Some of the projects I'm working on aren't even meant to be installed as packages but I want to use the same directory structure for everything, and this would mean writing a setup.py just so I could test them!
Edit: Two interesting variants of this are discussed in the answers below: the setup.py develop command in logc's answer and pip install -e in mine. They avoid having to re-"install" for every little edit, but you still need to create a setup.py for packages you never intend to fully install, and doesn't work very well with PyCharm (which has a menu entry to run the develop command but no easy way to run the scripts that it copies to the virtual environment).
Move the scripts to the project's root directory (i.e. in mylibsrc/ instead of mylibsrc/bin/).
Yuck! This is a last resort but, unfortunately, this seems like the only feasible option at the moment.
Run modules as scripts
Since I posted this question, I've learnt that you can run a module as if it were a script using Python's -m command-line switch (which I had thought only applied to packages).
So I think the best solution is this:
Instead of writing wrapper scripts in a bin subdirectory, put the bulk of the logic in modules (as you should anyway), and put at the end of relevant modules if __name__ == "__main__": main(), as you would in a script.
To run the scripts on the command line, call the modules directly like this: python -m pkg_name.module_name
If you have a setup.py, as Alik said you can generate wrapper scripts at installation time so your users don't need to run them in this funny way.
PyCharm doesn't support running modules in this way (see this request). However, you can just run modules (and also scripts in bin) like normal because PyCharm automatically adds the project root to the PYTHONPATH, so import statements resolve without any further effort. There are a few gotchas for this though:
The main problem the working directory will be incorrect, so opening data files won't work. Unfortunately there is no quick fix; the first time you run each script, you must then stop it and change its configured working directory (see this link).
If your package directory is not directly within the root project directory, you need to mark its parent directory as a source directory in the project structure settings page.
Relative imports don't work i.e. you can do from pkg_name.other_module import fn but not from .other_module import fn. Relative imports are usually poor style anyway, but they're useful for unit tests.
If a module has a circular dependency and you run it directly, it will end up being imported twice (once as pkg_name.module_name and once as __main__). But you shouldn't have circular dependencies anyway.
Bonus command line fun:
If you still want to put some scripts in bin/ you can call them with python -m bin.scriptname (but in Python 2 you'll need to put an __init__.py in the bin directory).
You can even run the overall package, if it has a __main__.py, like this: python -m pkg_name
Pip editable mode
There is an alternative for the command line, which is not as simple, but still worth knowing about:
Use pip's editable mode, documented here
To use it, make a setup.py, and use the following command to install the package into your virtual environment: pip install -e .
Note the trailing dot, which refers to the current directory.
This puts the scripts generated from your setup.py in your virtual environment's bin directory, and links to your package source code so you can edit and debug it without re-running pip.
When you're done, you can run pip uninstall pkg_name
This is similar to setup.py's develop command, but uninstallation seems to work better.
The simplest way is to use setuptools in your setup.py script, and use the entry_points keyword, see the documentation of Automatic Script Creation.
In more detail: you create a setup.py that looks like this
from setuptools import setup
setup(
# other arguments here...
entry_points={
'console_scripts': [
'foo = my_package.some_module:main_func',
'bar = other_module:some_func',
],
'gui_scripts': [
'baz = my_package_gui:start_func',
]
}
)
then create other Python packages and modules underneath the directory where this setup.py exists, e.g. following the above example:
.
├── my_package
│ ├── __init__.py
│ └── some_module.py
├── my_package_gui
│ └── __init__.py
├── other_module.py
└── setup.py
and then run
$ python setup.py install
or
$ python setup.py develop
Either way, new Python scripts (executable scripts without the .py suffix) are created for you that point to the entry points you have described in setup.py. Usually, they are at the Python interpreter's notion of "directory where executable binaries should be", which is usually on your PATH already. If you are using a virtual env, then virtualenv tricks the Python interpreter into thinking this directory is bin/ under wherever you have defined that the virtualenv should be. Following the example above, in a virtualenv, running the previous commands should result in:
bin
├── bar
├── baz
└── foo
I am new to Python and mostly used my own code. But so now I downloaded a package that I need for some problem I have.
Example structure:
root\
externals\
__init__.py
cowfactory\
__init__.py
cow.py
milk.py
kittens.py
Now the cowfactory's __init__.py does from cowfactory import cow. This gives an import error.
I could fix it and change the import statement to from externals.cowfactory import cow but something tells me that there is an easier way since it's not very practical.
An other fix could be to put the cowfactory package in the root of my project but that's not very tidy either.
I think I have to do something with the __init__.py file in the externals directory but I am not sure what.
Inside the cowfactory package, relative imports should be used such as from . import cow. The __init__.py file in externals is not necessary. Assuming that your project lies in root\ and cowfactory is the external package you downloaded, you can do it in two different ways:
Install the external module
External Python packages usually come with a file "setup.py" that allows you to install it. On Windows, it would be the command "setup.py bdist_wininst" and you get a EXE installer in the "dist" directory (if it builds correctly). Use that installer and the package will be installed in the Python installation directory. Afterwards, you can simply do an import cowfactory just like you would do import os.
If you have pip or easy_install installed: Many external packages can be installed with them (pip even allows easy uninstallation).
Use PYTHONPATH for development
If you want to keep all dependencies together in your project directory, then keep all external packages in the externals\ folder and add the folder to the PYTHONPATH. If you're using the command line, you can create a batch file containing something like
set PYTHONPATH=%PYTHONPATH%:externals
yourprogram.py
I'm actually doing something similar, but using PyDev+Eclipse. There, you can change the "Run configurations" to include the environment variable PYTHONPATH with the value "externals". After the environment variable is set, you can simply import cowfactory in your own modules. Note how that is better than from external import cowfactory because in the latter case, it wouldn't work anymore once you install your project (or you'd have to install all external dependencies as a package called "external" which is a bad idea).
Same solutions of course apply to Linux, as well, but with different commands.
generally, you would use easy_install our pip to install it for you in the appropriate directory. There is a site-packages directory on windows where you can put the package if you can't use easy_install for some reason. On ubuntu, it's /usr/lib/pythonX.Y/dist-packages. Google for your particular system. Or you can put it anywhere on your PYTHONPATH environment variable.
As a general rule, it's good to not put third party libs in your programs directory structure (although there are differing opinions on this vis a vis source control). This keeps your directory structure as minimalist as possible.
The easiest way is to use the enviroment variable $PYTHONPATH. You set it before running your scripts as follows:
export $PYTHONPATH=$PYTHONPATH:/root/externals/
You can add as many folders as you want (provided their separate by :) and python will look in all those folders when importing.