The topic of namespace packages seems a bit confusing for the uninitiated, and it doesn't help that prior versions of Python have implemented it in a few different ways or that a lot of the Q&A on StackOverflow are dated. I am looking for a solution in Python 3.5 or later.
#The scenario:
I'm in the process of refactoring a bunch of Python code into modules and submodules, and working to get each of these projects set up to operate independently of each other while sitting in the same namespace.
We're eventually going to be using an internal PyPi server, serving these packages to our internal network and don't want to confuse them with external (public) PyPi packages.
Example: I have 2 modules, and I would like to be able to perform the following:
from org.client.client1 import mod1
from org.common import config
The reflected modules would be separated as such:
Repository 1:
org_client_client1_mod1/
setup.py
mod1/
__init__.py
somefile.py
Repository 2:
org_common_config/
setup.py
config/
__init__.py
someotherfile.py
My Git repositories are already setup as org_client_client1_mod1 and org_common_config, so I just need to perform the setup on the packaging and __init__.py files, I believe.
Questions:
#1
With the __init__.py, which of these should I be using (if any)?:
from pkgutil import extend_path
__path__ = extend_path(__path__, __name__)
Or:
import pkg_resources
pkg_resources.declare_namespace(__name__)
#2
With setup.py, do I still need to add the namespace_modules parameter, and if so, would I use namespace_modules=['org.common'],
or namespace_modules=['org', 'common']?
#3
Could I forgo all of the above by just implementing this differently somehow? Perhaps something simpler or more "pythonic"?
Late to the party, but never hurts to help fellow travellers down the namespace path in Python!
#1:
With the __init__.py, which of these should I be using (if any)?:
It depends, There are three ways to do namespace packages as listed here:
Use native namespace packages. This type of namespace package is defined in PEP 420 and is available in Python 3.3 and later. This is recommended if packages in your namespace only ever need to support Python 3 and installation via pip.
Use pkgutil-style namespace packages. This is recommended for new packages that need to support Python 2 and 3 and installation via both pip and python setup.py install.
Use pkg_resources-style namespace packages. This method is recommended if you need compatibility with packages already using this method or if your package needs to be zip-safe.
If you are using #2 (pkgutil-style) or #3 (pkg_resources-style), then you will have to use the corresponding style for __init__.py files. If you use native namespaces then no __init__.py in the namespace directory.
#2:
With setup.py, do I still need to add the namespace_modules parameter, and if so, would I use namespace_modules=['org.common'], or namespace_modules=['org', 'common']?
If your choice of namespace package is not native style, then yes, you will need namespace_packages in your setup().
#3:
Could I forgo all of the above by just implementing this differently somehow? Perhaps something simpler or more "pythonic"?
Since you ended up down to a complex topic in python, it seems you know what you are doing, what you want and identified that creating a Python Namespace package is the way to do it. This would be considered a pythonic way to solve a problem.
Adding to your questions, here are a few things I discovered:
I read PEP420, the Python Packaging guide and spent a lot of time understanding the namespace packages, and I generally understood how it worked. I read through a couple of answers here, here, here, and this thread on SO as well - the example here and on the Git link shared by Rob.
My problem however was after I created my package. As all the instructions and sample code explicitly listed the package in the setuptools.setup(package=[]) function, my code failed. My sub-packages/directories were not included. Digging deeper, I found out that setuptools has a find_namespace_package() function that helps in adding sub-packages too
EDIT:
Link to find_namespace_packages() (setuptools version greater than 40.1.0): https://setuptools.readthedocs.io/en/latest/setuptools.html#find-namespace-packages
EDIT (08/09/2019):
To complete the answer, let me also restructure with an example.
The following solution is assuming Python 3.3+ which has support for implicit namespace packages
Since you are looking for a solution for Python version 3.5 or later, let's take the code samples provided and elaborate further.
Let's assume the following:
Namespace/Python package name : org
Distribution packages: org_client, org_common
Python: 3.3+
setuptools: 40.1.0
For you to do the following
from org.client.client1 import mod1
from org.common import config
And keeping your top level directories the same, viz. org_client_client1_mod1 and org_common_config, you can change your structure to the following
Repository 1:
org_client_client1_mod1/
setup.py
org/
client/
client1/
__init__.py
submod1/
__init__.py
mod1/
__init__.py
somefile.py
file1.py
Updated setup.py
from setuptools import find_namespace_packages, setup
setup(
name="org_client",
...
packages=find_namespace_packages(), # Follows similar lookup as find_packages()
...
)
Repository 2:
org_common_config/
setup.py
org/
common/
__init__.py
config/
__init__.py
someotherfile.py
Updated setup.py:
from setuptools import find_namespace_packages, setup
setup(
name="org_common",
...
packages=find_namespace_packages(), # Follows similar lookup as find_packages()
...
)
To install (using pip):
(venv) $ pip3 install org_common_config/
(venv) $ pip3 install org_client_client1_mod1/
Updated pip list will show the following:
(venv) $ pip3 list
...
org_client
org_common
...
But they won't be importable, for importing you will have to follow org.client and org.common notation.
To understand why, you can browse here (assuming inside venv):
(venv) $ cd venv/lib/python3.5/site-packages/
(venv) $ ls -l | grep org
You'll see that there's no org_client or org_common directories, they are interpreted as a namespace package.
(venv) $ cd venv/lib/python3.5/site-packages/org/
(venv) $ ls -l
client/
common/
...
This is a tough subject. All the -'s, _'s, and __init__.py's everywhere don't exactly make it easy on us.
First, I'll answer your questions:
With the __init__.py, which of these should I be using (if any)?
__init__.py can be completely empty, it just needs to be in the correct place. Namely (pun) they should be in any subpackage containing python code (excluding setup.py.) Follow those rules and you should be fine.
With setup.py, do I still need to add the namespace_modules parameter, and if so, would I use namespace_modules=['org.common'], or namespace_modules=['org', 'common']?
Nope! Only name= and packages=. However, note the format of the packages= arg compared against the directory structure.
Here's the format of the package= arg:
Here's the corresponding directory structure:
Could I forgo all of the above by just implementing this differently somehow? Perhaps something simpler or more "pythonic"?
If you want to be able to install multiple features individually, but under the same top-level namespace, you're on the right track.
I'll spend the rest of this answer re-implementing your namespace package in native format:
I'll put all helpful documentation I've been able to find at the bottom of the post.
K so I'm going to assume you want native namespace packages. First let's look at the current structure of your 2 repos:
org_client_client1_mod1/
setup.py
mod1/
__init__.py
somefile.py
&
org_common_config/
setup.py
config/
__init__.py
someotherfile.py
This^ would be too easy!!!
To get what you want:
My brain isn't elastic enough to know if we can go 3-levels deep with namespace packages, but to do what you want, here's what I'm pretty sure you'd want to do:
org-client/
setup.py
org/
client/
client1/
__init__.py
mod1/
__init__.py
somefile.py
&
org-common-but-also-note-this-name-doesnt-matter/
setup.py
org/
common/
__init__.py
config/
__init__.py
someotherfile.py
Basically then the key is going to be specifying the correct name= & packages= args to stuptools.setup() inside of each setup.py.
These are going to be:
name='org_client',
...
packages=['org.client']
&
name='org_common'
...
packages['org.common']
respectively.
Then just install each one with pip install . inside each top-level dir.
Installing the first one will give you access to the somefile.py module, and installing the second will give you access to someotherfile.py. It also won't get confused about you trying to install 2 packages named org in the same environment.
K so the most helpful section of the docs: https://packaging.python.org/guides/packaging-namespace-packages/#packaging-namespace-packages
And then here's how I actually came to understand this: https://github.com/pypa/sample-namespace-packages/tree/master/native
Related
I want to inherit from a class in a file that lies in a directory above the current one.
Is it possible to relatively import that file?
from ..subpkg2 import mod
Per the Python docs: When inside a package hierarchy, use two dots, as the import statement doc says:
When specifying what module to import you do not have to specify the absolute name of the module. When a module or package is contained within another package it is possible to make a relative import within the same top package without having to mention the package name. By using leading dots in the specified module or package after from you can specify how high to traverse up the current package hierarchy without specifying exact names. One leading dot means the current package where the module making the import exists. Two dots means up one package level. Three dots is up two levels, etc. So if you execute from . import mod from a module in the pkg package then you will end up importing pkg.mod. If you execute from ..subpkg2 import mod from within pkg.subpkg1 you will import pkg.subpkg2.mod. The specification for relative imports is contained within PEP 328.
PEP 328 deals with absolute/relative imports.
import sys
sys.path.append("..") # Adds higher directory to python modules path.
#gimel's answer is correct if you can guarantee the package hierarchy he mentions. If you can't -- if your real need is as you expressed it, exclusively tied to directories and without any necessary relationship to packaging -- then you need to work on __file__ to find out the parent directory (a couple of os.path.dirname calls will do;-), then (if that directory is not already on sys.path) prepend temporarily insert said dir at the very start of sys.path, __import__, remove said dir again -- messy work indeed, but, "when you must, you must" (and Pyhon strives to never stop the programmer from doing what must be done -- just like the ISO C standard says in the "Spirit of C" section in its preface!-).
Here is an example that may work for you:
import sys
import os.path
sys.path.append(
os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)))
import module_in_parent_dir
Import module from a directory which is exactly one level above the current directory:
from .. import module
How to load a module that is a directory up
preface: I did a substantial rewrite of a previous answer with the hopes of helping ease people into python's ecosystem, and hopefully give everyone the best change of success with python's import system.
This will cover relative imports within a package, which I think is the most probable case to OP's question.
Python is a modular system
This is why we write import foo to load a module "foo" from the root namespace, instead of writing:
foo = dict(); # please avoid doing this
with open(os.path.join(os.path.dirname(__file__), '../foo.py') as foo_fh: # please avoid doing this
exec(compile(foo_fh.read(), 'foo.py', 'exec'), foo) # please avoid doing this
Python isn't coupled to a file-system
This is why we can embed python in environment where there isn't a defacto filesystem without providing a virtual one, such as Jython.
Being decoupled from a filesystem lets imports be flexible, this design allows for things like imports from archive/zip files, import singletons, bytecode caching, cffi extensions, even remote code definition loading.
So if imports are not coupled to a filesystem what does "one directory up" mean? We have to pick out some heuristics but we can do that, for example when working within a package, some heuristics have already been defined that makes relative imports like .foo and ..foo work within the same package. Cool!
If you sincerely want to couple your source code loading patterns to a filesystem, you can do that. You'll have to choose your own heuristics, and use some kind of importing machinery, I recommend importlib
Python's importlib example looks something like so:
import importlib.util
import sys
# For illustrative purposes.
file_path = os.path.join(os.path.dirname(__file__), '../foo.py')
module_name = 'foo'
foo_spec = importlib.util.spec_from_file_location(module_name, file_path)
# foo_spec is a ModuleSpec specifying a SourceFileLoader
foo_module = importlib.util.module_from_spec(foo_spec)
sys.modules[module_name] = foo_module
foo_spec.loader.exec_module(foo_module)
foo = sys.modules[module_name]
# foo is the sys.modules['foo'] singleton
Packaging
There is a great example project available officially here: https://github.com/pypa/sampleproject
A python package is a collection of information about your source code, that can inform other tools how to copy your source code to other computers, and how to integrate your source code into that system's path so that import foo works for other computers (regardless of interpreter, host operating system, etc)
Directory Structure
Lets have a package name foo, in some directory (preferably an empty directory).
some_directory/
foo.py # `if __name__ == "__main__":` lives here
My preference is to create setup.py as sibling to foo.py, because it makes writing the setup.py file simpler, however you can write configuration to change/redirect everything setuptools does by default if you like; for example putting foo.py under a "src/" directory is somewhat popular, not covered here.
some_directory/
foo.py
setup.py
.
#!/usr/bin/env python3
# setup.py
import setuptools
setuptools.setup(
name="foo",
...
py_modules=['foo'],
)
.
python3 -m pip install --editable ./ # or path/to/some_directory/
"editable" aka -e will yet-again redirect the importing machinery to load the source files in this directory, instead copying the current exact files to the installing-environment's library. This can also cause behavioral differences on a developer's machine, be sure to test your code!
There are tools other than pip, however I'd recommend pip be the introductory one :)
I also like to make foo a "package" (a directory containing __init__.py) instead of a module (a single ".py" file), both "packages" and "modules" can be loaded into the root namespace, modules allow for nested namespaces, which is helpful if we want to have a "relative one directory up" import.
some_directory/
foo/
__init__.py
setup.py
.
#!/usr/bin/env python3
# setup.py
import setuptools
setuptools.setup(
name="foo",
...
packages=['foo'],
)
I also like to make a foo/__main__.py, this allows python to execute the package as a module, eg python3 -m foo will execute foo/__main__.py as __main__.
some_directory/
foo/
__init__.py
__main__.py # `if __name__ == "__main__":` lives here, `def main():` too!
setup.py
.
#!/usr/bin/env python3
# setup.py
import setuptools
setuptools.setup(
name="foo",
...
packages=['foo'],
...
entry_points={
'console_scripts': [
# "foo" will be added to the installing-environment's text mode shell, eg `bash -c foo`
'foo=foo.__main__:main',
]
},
)
Lets flesh this out with some more modules:
Basically, you can have a directory structure like so:
some_directory/
bar.py # `import bar`
foo/
__init__.py # `import foo`
__main__.py
baz.py # `import foo.baz
spam/
__init__.py # `import foo.spam`
eggs.py # `import foo.spam.eggs`
setup.py
setup.py conventionally holds metadata information about the source code within, such as:
what dependencies are needed to install named "install_requires"
what name should be used for package management (install/uninstall "name"), I suggest this match your primary python package name in our case foo, though substituting underscores for hyphens is popular
licensing information
maturity tags (alpha/beta/etc),
audience tags (for developers, for machine learning, etc),
single-page documentation content (like a README),
shell names (names you type at user shell like bash, or names you find in a graphical user shell like a start menu),
a list of python modules this package will install (and uninstall)
a defacto "run tests" entry point python ./setup.py test
Its very expansive, it can even compile c extensions on the fly if a source module is being installed on a development machine. For a every-day example I recommend the PYPA Sample Repository's setup.py
If you are releasing a build artifact, eg a copy of the code that is meant to run nearly identical computers, a requirements.txt file is a popular way to snapshot exact dependency information, where "install_requires" is a good way to capture minimum and maximum compatible versions. However, given that the target machines are nearly identical anyway, I highly recommend creating a tarball of an entire python prefix. This can be tricky, too detailed to get into here. Check out pip install's --target option, or virtualenv aka venv for leads.
back to the example
how to import a file one directory up:
From foo/spam/eggs.py, if we wanted code from foo/baz we could ask for it by its absolute namespace:
import foo.baz
If we wanted to reserve capability to move eggs.py into some other directory in the future with some other relative baz implementation, we could use a relative import like:
import ..baz
Here's a three-step, somewhat minimalist version of ThorSummoner's answer for the sake of clarity. It doesn't quite do what I want (I'll explain at the bottom), but it works okay.
Step 1: Make directory and setup.py
filepath_to/project_name/
setup.py
In setup.py, write:
import setuptools
setuptools.setup(name='project_name')
Step 2: Install this directory as a package
Run this code in console:
python -m pip install --editable filepath_to/project_name
Instead of python, you may need to use python3 or something, depending on how your python is installed. Also, you can use -e instead of --editable.
Now, your directory will look more or less like this. I don't know what the egg stuff is.
filepath_to/project_name/
setup.py
test_3.egg-info/
dependency_links.txt
PKG-INFO
SOURCES.txt
top_level.txt
This folder is considered a python package and you can import from files in this parent directory even if you're writing a script anywhere else on your computer.
Step 3. Import from above
Let's say you make two files, one in your project's main directory and another in a sub directory. It'll look like this:
filepath_to/project_name/
top_level_file.py
subdirectory/
subfile.py
setup.py |
test_3.egg-info/ |----- Ignore these guys
... |
Now, if top_level_file.py looks like this:
x = 1
Then I can import it from subfile.py, or really any other file anywhere else on your computer.
# subfile.py OR some_other_python_file_somewhere_else.py
import random # This is a standard package that can be imported anywhere.
import top_level_file # Now, top_level_file.py works similarly.
print(top_level_file.x)
This is different than what I was looking for: I hoped python had a one-line way to import from a file above. Instead, I have to treat the script like a module, do a bunch of boilerplate, and install it globally for the entire python installation to have access to it. It's overkill. If anyone has a simpler method than doesn't involve the above process or importlib shenanigans, please let me know.
Polished answer of #alex-martelli with pathlib:
import pathlib
import sys
_parentdir = pathlib.Path(__file__).parent.parent.resolve()
sys.path.insert(0, str(_parentdir))
import module_in_parent_dir
sys.path.remove(str(_parentdir))
To run python /myprogram/submodule/mymodule.py which imports /myprogram/mainmodule.py, e.g., via
from mainmodule import *
on Linux (e.g., in the python Docker image), I had to add the program root directory to PYTHONPATH:
export PYTHONPATH=/myprogram
It is 2022 and none of the answers really worked for me. Here is what worked in the end
import sys
sys.path.append('../my_class')
import my_class
My directory structure:
src
--my_class.py
notebooks
-- mynotebook.ipynb
I imported my_class from mynotebook.ipynb.
You can use the sys.path.append() method to add the directory containing the package to the list of paths searched for modules. For example, if the package is located two directories above the current directory, you can use the following code:
import sys
sys.path.append("../../")
if the package is location one directory above the current directory, you can use below code:
import sys
sys.path.append("..")
Python is a modular system
Python doesn't rely on a file system
To load python code reliably, have that code in a module, and that module installed in python's library.
Installed modules can always be loaded from the top level namespace with import <name>
There is a great sample project available officially here: https://github.com/pypa/sampleproject
Basically, you can have a directory structure like so:
the_foo_project/
setup.py
bar.py # `import bar`
foo/
__init__.py # `import foo`
baz.py # `import foo.baz`
faz/ # `import foo.faz`
__init__.py
daz.py # `import foo.faz.daz` ... etc.
.
Be sure to declare your setuptools.setup() in setup.py,
official example: https://github.com/pypa/sampleproject/blob/master/setup.py
In our case we probably want to export bar.py and foo/__init__.py, my brief example:
setup.py
#!/usr/bin/env python3
import setuptools
setuptools.setup(
...
py_modules=['bar'],
packages=['foo'],
...
entry_points={},
# Note, any changes to your setup.py, like adding to `packages`, or
# changing `entry_points` will require the module to be reinstalled;
# `python3 -m pip install --upgrade --editable ./the_foo_project
)
.
Now we can install our module into the python library;
with pip, you can install the_foo_project into your python library in edit mode,
so we can work on it in real time
python3 -m pip install --editable=./the_foo_project
# if you get a permission error, you can always use
# `pip ... --user` to install in your user python library
.
Now from any python context, we can load our shared py_modules and packages
foo_script.py
#!/usr/bin/env python3
import bar
import foo
print(dir(bar))
print(dir(foo))
I've spent hours researching this problem, and I'm still baffled. Please find my ignorance charming.
I'm building a python program that will allow me to pit two AIs against each other in a game of battleship.
Here's my directory structure:
.
├── ais_play_battleship
│ ├── game.py
│ ├── __init__.py
│ ├── player.py
│ └── ship.py
├── LICENSE
├── README.md
└── tests
└── ship_test.py
2 directories, 7 files
Currently, I'm trying to write ship_test.py, but I cannot seem to import ais_play_battleship.ship. I get the dreaded "ModuleNotFoundError"
Here's what my research has taught me about my problem:
If you want to import python code from another directory, you should make that directory a package instead of a module. Therefore, I've placed an __init__.py file in the root of ais_play_battleship.
Python will only search the directory python is launched from as well as the directory of the script you're running. Therefore, I've been trying to launch my tests by running python3 tests/ship_tests.py from the root directory.
Here are my specific questions:
Why is the error a "ModuleNotFound" error? Shouldn't it be "PackageNotFound"?
Am I correct to make ais_play_battleship a package?
How can I keep my tests in a separate directory and still use the code in ais_play_battleship?
Please forgive me, as I'm not very good at asking questions on StackOverflow. Please tell me how I can improve.
I am answering my own question, as I haven't yet received a satisfactory answer. The best resource I've found is available here. In summary:
Python does NOT search the directory you run python from for modules. Furthermore, adding an __init__.py file to make a directory a package is not enough to make it visible to an instance of python running in another folder. You must also install that package. Therefore, the only two ways to access a module in another directory are:
Install the packaged module in site-packages (this requires the creation of a setup.py file and installation using pip install . More information is available here.
Modify path to resolve the module
I ended up settling with the second option, for reasons discussed below.
The first option requires one to reinstall the package at every change to a package. This is difficult on a constantly-changing codebase, but can be made easier by using build automation. However, I'd like to avoid this added complexity.
I shied away from the second option for a long time, because it seemed that modifying the path would require hard-coding the absolute path to my module, which is obviously unacceptable, as every developer would have to edit that path to fit their environment. However, this guide provides a solution to this problem. Create a ./tests/context.py file with the following contents:
import os
import sys
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
Then, in my ship_test.py module, I imported the context and the module I needed:
import context
import ais_play_battleship.ship
# (I include the submodule ship because ais_play_battleship itself does not have
# any attributes or methods, and the submodule ship is the only one I am testing
# in ship_test.py)
This fits my project better, because it works as expected without having to worry about installing my package (or the method by which my package was installed).
To solve this problem without relying on hacking about your sys.path, create a setup.py file and as a build step for your test runner, have it run pip install . first. You might want to use a tool like tox.
In the top level directory:
setup.py
from setuptools import setup
setup(name='ais_play_battleship')
tox.ini
[tox]
envlist = py36, py37
[testenv]
deps=pytest
commands=
pip install . --quiet
py.test -q
then run your tests (in this example we use tox to do this so that we can also configure how the test environment can be configured) : tox
Run tests/ship_test.py as a module
python -m tests.ship_test
I've been coding in Python for long, but never actually tried to pack a piece of code so that I can share it. I started reading
https://python-packaging.readthedocs.io/en/latest/.
I started with the simplest possible case, say I want to share a module named 'clipper', and the only important thing is a class called Clipper. It seems in case I use setuptools I should create somewhere folders
clipper/clipper
and inside the inner clipper, place a file __init__.py
with the definition of class Clipper. So far so good. Theoretically, after installing the package, the way to use the class would be:
import clipper
cl = clipper.Clipper()
My problem is, I am assuming that while I am developing and before any installation, the same code should work. I mean, the previous code should create an instance of the object. But how would that work? How should I set PYTHONPATH so that the previous import would actually work?
Maybe I got something really wrong, I thought packing would be easier compared to coding, but I've spent some time and I don't get it. Any help, please?
Rather than modifying your Python path, install the packaged module as an editable version and your environment will handle this for you. When you have it running as an editable version you'll be able to make changes to the code on your local development instance.
For example, assuming you have Pip you can run the following command in the first 'clipper' folder (the same folder as the setup.py file you created during packaging):
pip install -e .
-e means editable
the . means install the package located in the current folder.
More detail here in an SO answer from 2015: "pip install --editable ./" vs "python setup.py develop"
Your directory tree would look a bit like this:
~/clipper/
setup.py
clipper/
__init__.py
clipper.py
The setup.py file contains information telling Python how to 'parse' your package. Things like the name of your project, the version and what packages to include are defined here. For your example, setup.py may look like this:
from distutils.core import setup
setup(
name="Clipper",
# A name for your package, typically your project name
description="My first package",
# A short description
version="1.0.0",
# A version specification
packages=["clipper"]
# A list of packages to include
)
Within clipper, clipper.py contains the actual Clipper class:
class Clipper(object):
def __init__(self):
pass
def foo(self):
print("Invoked foo!")
__init__.py is a special type of file. It defines the public interface for interacting with your package. Typically, it imports all public functions and classes:
from .clipper import Clipper
Finally, to turn this into a proper package, run python3 setup.py sdist. This creates the source distribution for your package and allows you to import it 1. Let's try that now. Navigate back to ~/clipper/ and start Python:
>>> from clipper import Clipper
>>> c = Clipper()
>>> c.foo()
Invoked foo!
>>>
And here's 'real' example of what a package directory would look like:
~/calculator/
setup.py
calculator/
__init__.py
add.py
substract.py
setup.py
from distutils.core import setup
setup(
name="Calculator",
description="Calculate stuff!",
version="1.0.0",
packages=["calculator"]
)
__init__.py
from .add import *
from .substract import *
add.py
def add(a, b):
"""Return `a` + `b`."""
return a + b
substract.py
def substract(a, b):
"""Return `a` - `b`."""
return a - b
For more information, see the Python tutorial on packaging.
1 You may get some warnings about missing information, but you can ignore that for now.
I have a repository I inherited used by a lot of teams, lots of scripts call it, and it seems like its going to be a real headache to make any structural changes to it. I would like to make this repo installable somehow. It is structured like this:
my_repo/
scripts.py
If it was my repository, I would change the structure like so and make it installable, and run python setup.py install:
my_repo/
setup.py
my_repo/
__init__.py
scripts.py
If this is not feasible (and it sounds like it might not be), can I somehow do something like:
my_repo/
setup.py
__init__.py
scripts.py
And add something to setup.py to let it know that the repo is structured funny like this, so that I can install it?
You can do what you suggest.
my_repo/
setup.py
__init__.py
scripts.py
The only thing is you will need to import modules in your package via their name if they are in the base level. So for example if your structure looked like this:
my_repo/
setup.py
__init__.py
scripts.py
lib.py
pkg/
__init__.py
pkgmodule.py
Then your imports in scripts.py might look like
from lib import func1, func2
from pkg.pkgmodule import stuff1, stuff2
So in your base directory imports are essentially by module name not by package. This could screw up some of your other packages namespaces if you're not careful, like if there is another dependency with a package named lib. So it would be best if you have these scripts running in a virtualenv and if you test to ensure namespacing doesn't get messed up
There is a directive in setup.py file to set the name of a package to install and from where it should get it's modules for installation. That would let you use the desired directory structure. For instance with a given directory structure as :
my_repo/
setup.py
__init__.py
scripts.py
You could write a setup.py such as:
setup(
# -- Package structure ----
packages=['my_repo'],
package_dir={'my_repo': '.'})
Thus anyone installing the contents of my_repo with the command "./setup.py install" or "pip install ." would end up with an installed copy of my_repo 's modules.
As a side note; relative imports work differently in python 2 and python 3. In the latter, any relative imports need to explicitly specify the will to do so. This method of installing my_repo will work in python 3 when calling in an absolute import fashion:
from my_repo import scripts
Is there a standard way to associate version string with a Python package in such way that I could do the following?
import foo
print(foo.version)
I would imagine there's some way to retrieve that data without any extra hardcoding, since minor/major strings are specified in setup.py already. Alternative solution that I found was to have import __version__ in my foo/__init__.py and then have __version__.py generated by setup.py.
Not directly an answer to your question, but you should consider naming it __version__, not version.
This is almost a quasi-standard. Many modules in the standard library use __version__, and this is also used in lots of 3rd-party modules, so it's the quasi-standard.
Usually, __version__ is a string, but sometimes it's also a float or tuple.
As mentioned by S.Lott (Thank you!), PEP 8 says it explicitly:
Module Level Dunder Names
Module level "dunders" (i.e. names with two leading and two trailing
underscores) such as __all__, __author__, __version__, etc.
should be placed after the module docstring but before any import
statements except from __future__ imports.
You should also make sure that the version number conforms to the format described in PEP 440 (PEP 386 a previous version of this standard).
I use a single _version.py file as the "once cannonical place" to store version information:
It provides a __version__ attribute.
It provides the standard metadata version. Therefore it will be detected by pkg_resources or other tools that parse the package metadata (EGG-INFO and/or PKG-INFO, PEP 0345).
It doesn't import your package (or anything else) when building your package, which can cause problems in some situations. (See the comments below about what problems this can cause.)
There is only one place that the version number is written down, so there is only one place to change it when the version number changes, and there is less chance of inconsistent versions.
Here is how it works: the "one canonical place" to store the version number is a .py file, named "_version.py" which is in your Python package, for example in myniftyapp/_version.py. This file is a Python module, but your setup.py doesn't import it! (That would defeat feature 3.) Instead your setup.py knows that the contents of this file is very simple, something like:
__version__ = "3.6.5"
And so your setup.py opens the file and parses it, with code like:
import re
VERSIONFILE="myniftyapp/_version.py"
verstrline = open(VERSIONFILE, "rt").read()
VSRE = r"^__version__ = ['\"]([^'\"]*)['\"]"
mo = re.search(VSRE, verstrline, re.M)
if mo:
verstr = mo.group(1)
else:
raise RuntimeError("Unable to find version string in %s." % (VERSIONFILE,))
Then your setup.py passes that string as the value of the "version" argument to setup(), thus satisfying feature 2.
To satisfy feature 1, you can have your package (at run-time, not at setup time!) import the _version file from myniftyapp/__init__.py like this:
from _version import __version__
Here is an example of this technique that I've been using for years.
The code in that example is a bit more complicated, but the simplified example that I wrote into this comment should be a complete implementation.
Here is example code of importing the version.
If you see anything wrong with this approach, please let me know.
Rewritten 2017-05
After 13+ years of writing Python code and managing various packages, I came to the conclusion that DIY is maybe not the best approach.
I started using the pbr package for dealing with versioning in my packages. If you are using git as your SCM, this will fit into your workflow like magic, saving your weeks of work (you will be surprised about how complex the issue can be).
As of today, pbr has 12M mongthly downloads, and reaching this level didn't include any dirty tricks. It was only one thing -- fixing a common packaging problem in a very simple way.
pbr can do more of the package maintenance burden, and is not limited to versioning, but it does not force you to adopt all its benefits.
So to give you an idea about how it looks to adopt pbr in one commit have a look switching packaging to pbr
Probably you would observed that the version is not stored at all in the repository. PBR does detect it from Git branches and tags.
No need to worry about what happens when you do not have a git repository because pbr does "compile" and cache the version when you package or install the applications, so there is no runtime dependency on git.
Old solution
Here is the best solution I've seen so far and it also explains why:
Inside yourpackage/version.py:
# Store the version here so:
# 1) we don't load dependencies by storing it in __init__.py
# 2) we can import it in setup.py for the same reason
# 3) we can import it into your module module
__version__ = '0.12'
Inside yourpackage/__init__.py:
from .version import __version__
Inside setup.py:
exec(open('yourpackage/version.py').read())
setup(
...
version=__version__,
...
If you know another approach that seems to be better let me know.
Per the deferred [STOP PRESS: rejected] PEP 396 (Module Version Numbers), there is a proposed way to do this. It describes, with rationale, an (admittedly optional) standard for modules to follow. Here's a snippet:
When a module (or package) includes a version number, the version SHOULD be available in the __version__ attribute.
For modules which live inside a namespace package, the module SHOULD include the __version__ attribute. The namespace package itself SHOULD NOT include its own __version__ attribute.
The __version__ attribute's value SHOULD be a string.
There is a slightly simpler alternative to some of the other answers:
__version_info__ = ('1', '2', '3')
__version__ = '.'.join(__version_info__)
(And it would be fairly simple to convert auto-incrementing portions of version numbers to a string using str().)
Of course, from what I've seen, people tend to use something like the previously-mentioned version when using __version_info__, and as such store it as a tuple of ints; however, I don't quite see the point in doing so, as I doubt there are situations where you would perform mathematical operations such as addition and subtraction on portions of version numbers for any purpose besides curiosity or auto-incrementation (and even then, int() and str() can be used fairly easily). (On the other hand, there is the possibility of someone else's code expecting a numerical tuple rather than a string tuple and thus failing.)
This is, of course, my own view, and I would gladly like others' input on using a numerical tuple.
As shezi reminded me, (lexical) comparisons of number strings do not necessarily have the same result as direct numerical comparisons; leading zeroes would be required to provide for that. So in the end, storing __version_info__ (or whatever it would be called) as a tuple of integer values would allow for more efficient version comparisons.
Many of these solutions here ignore git version tags which still means you have to track version in multiple places (bad). I approached this with the following goals:
Derive all python version references from a tag in the git repo
Automate git tag/push and setup.py upload steps with a single command that takes no inputs.
How it works:
From a make release command, the last tagged version in the git repo is found and incremented. The tag is pushed back to origin.
The Makefile stores the version in src/_version.py where it will be read by setup.py and also included in the release. Do not check _version.py into source control!
setup.py command reads the new version string from package.__version__.
Details:
Makefile
# remove optional 'v' and trailing hash "v1.0-N-HASH" -> "v1.0-N"
git_describe_ver = $(shell git describe --tags | sed -E -e 's/^v//' -e 's/(.*)-.*/\1/')
git_tag_ver = $(shell git describe --abbrev=0)
next_patch_ver = $(shell python versionbump.py --patch $(call git_tag_ver))
next_minor_ver = $(shell python versionbump.py --minor $(call git_tag_ver))
next_major_ver = $(shell python versionbump.py --major $(call git_tag_ver))
.PHONY: ${MODULE}/_version.py
${MODULE}/_version.py:
echo '__version__ = "$(call git_describe_ver)"' > $#
.PHONY: release
release: test lint mypy
git tag -a $(call next_patch_ver)
$(MAKE) ${MODULE}/_version.py
python setup.py check sdist upload # (legacy "upload" method)
# twine upload dist/* (preferred method)
git push origin master --tags
The release target always increments the 3rd version digit, but you can use the next_minor_ver or next_major_ver to increment the other digits. The commands rely on the versionbump.py script that is checked into the root of the repo
versionbump.py
"""An auto-increment tool for version strings."""
import sys
import unittest
import click
from click.testing import CliRunner # type: ignore
__version__ = '0.1'
MIN_DIGITS = 2
MAX_DIGITS = 3
#click.command()
#click.argument('version')
#click.option('--major', 'bump_idx', flag_value=0, help='Increment major number.')
#click.option('--minor', 'bump_idx', flag_value=1, help='Increment minor number.')
#click.option('--patch', 'bump_idx', flag_value=2, default=True, help='Increment patch number.')
def cli(version: str, bump_idx: int) -> None:
"""Bumps a MAJOR.MINOR.PATCH version string at the specified index location or 'patch' digit. An
optional 'v' prefix is allowed and will be included in the output if found."""
prefix = version[0] if version[0].isalpha() else ''
digits = version.lower().lstrip('v').split('.')
if len(digits) > MAX_DIGITS:
click.secho('ERROR: Too many digits', fg='red', err=True)
sys.exit(1)
digits = (digits + ['0'] * MAX_DIGITS)[:MAX_DIGITS] # Extend total digits to max.
digits[bump_idx] = str(int(digits[bump_idx]) + 1) # Increment the desired digit.
# Zero rightmost digits after bump position.
for i in range(bump_idx + 1, MAX_DIGITS):
digits[i] = '0'
digits = digits[:max(MIN_DIGITS, bump_idx + 1)] # Trim rightmost digits.
click.echo(prefix + '.'.join(digits), nl=False)
if __name__ == '__main__':
cli() # pylint: disable=no-value-for-parameter
This does the heavy lifting how to process and increment the version number from git.
__init__.py
The my_module/_version.py file is imported into my_module/__init__.py. Put any static install config here that you want distributed with your module.
from ._version import __version__
__author__ = ''
__email__ = ''
setup.py
The last step is to read the version info from the my_module module.
from setuptools import setup, find_packages
pkg_vars = {}
with open("{MODULE}/_version.py") as fp:
exec(fp.read(), pkg_vars)
setup(
version=pkg_vars['__version__'],
...
...
)
Of course, for all of this to work you'll have to have at least one version tag in your repo to start.
git tag -a v0.0.1
I use a JSON file in the package dir. This fits Zooko's requirements.
Inside pkg_dir/pkg_info.json:
{"version": "0.1.0"}
Inside setup.py:
from distutils.core import setup
import json
with open('pkg_dir/pkg_info.json') as fp:
_info = json.load(fp)
setup(
version=_info['version'],
...
)
Inside pkg_dir/__init__.py:
import json
from os.path import dirname
with open(dirname(__file__) + '/pkg_info.json') as fp:
_info = json.load(fp)
__version__ = _info['version']
I also put other information in pkg_info.json, like author. I
like to use JSON because I can automate management of metadata.
Lots of work toward uniform versioning and in support of conventions has been completed since this question was first asked. Palatable options are now detailed in the Python Packaging User Guide. Also noteworthy is that version number schemes are relatively strict in Python per PEP 440, and so keeping things sane is critical if your package will be released to the Cheese Shop.
Here's a shortened breakdown of versioning options:
Read the file in setup.py (setuptools) and get the version.
Use an external build tool (to update both __init__.py as well as source control), e.g. bump2version, changes or zest.releaser.
Set the value to a __version__ global variable in a specific module.
Place the value in a simple VERSION text file for both setup.py and code to read.
Set the value via a setup.py release, and use importlib.metadata to pick it up at runtime. (Warning, there are pre-3.8 and post-3.8 versions.)
Set the value to __version__ in sample/__init__.py and import sample in setup.py.
Use setuptools_scm to extract versioning from source control so that it's the canonical reference, not code.
NOTE that (7) might be the most modern approach (build metadata is independent of code, published by automation). Also NOTE that if setup is used for package release that a simple python3 setup.py --version will report the version directly.
Also worth noting is that as well as __version__ being a semi-std. in python so is __version_info__ which is a tuple, in the simple cases you can just do something like:
__version__ = '1.2.3'
__version_info__ = tuple([ int(num) for num in __version__.split('.')])
...and you can get the __version__ string from a file, or whatever.
There doesn't seem to be a standard way to embed a version string in a python package. Most packages I've seen use some variant of your solution, i.e. eitner
Embed the version in setup.py and have setup.py generate a module (e.g. version.py) containing only version info, that's imported by your package, or
The reverse: put the version info in your package itself, and import that to set the version in setup.py
arrow handles it in an interesting way.
Now (since 2e5031b)
In arrow/__init__.py:
__version__ = 'x.y.z'
In setup.py:
from arrow import __version__
setup(
name='arrow',
version=__version__,
# [...]
)
Before
In arrow/__init__.py:
__version__ = 'x.y.z'
VERSION = __version__
In setup.py:
def grep(attrname):
pattern = r"{0}\W*=\W*'([^']+)'".format(attrname)
strval, = re.findall(pattern, file_text)
return strval
file_text = read(fpath('arrow/__init__.py'))
setup(
name='arrow',
version=grep('__version__'),
# [...]
)
I also saw another style:
>>> django.VERSION
(1, 1, 0, 'final', 0)
After several hours of trying to find the simplest reliable solution, here are the parts:
create a version.py file INSIDE the folder of your package "/mypackage":
# Store the version here so:
# 1) we don't load dependencies by storing it in __init__.py
# 2) we can import it in setup.py for the same reason
# 3) we can import it into your module module
__version__ = '1.2.7'
in setup.py:
exec(open('mypackage/version.py').read())
setup(
name='mypackage',
version=__version__,
in the main folder init.py:
from .version import __version__
The exec() function runs the script outside of any imports, since setup.py is run before the module can be imported. You still only need to manage the version number in one file in one place, but unfortunately it is not in setup.py. (that's the downside, but having no import bugs is the upside)
pbr with bump2version
This solution was derived from this article
The use case - python GUI package distributed via PyInstaller. Needs to show version info.
Here is the structure of the project packagex
packagex
├── packagex
│ ├── __init__.py
│ ├── main.py
│ └── _version.py
├── packagex.spec
├── LICENSE
├── README.md
├── .bumpversion.cfg
├── requirements.txt
├── setup.cfg
└── setup.py
where setup.py is
# setup.py
import os
import setuptools
about = {}
with open("packagex/_version.py") as f:
exec(f.read(), about)
os.environ["PBR_VERSION"] = about["__version__"]
setuptools.setup(
setup_requires=["pbr"],
pbr=True,
version=about["__version__"],
)
packagex/_version.py contains just
__version__ = "0.0.1"
and packagex/__init__.py
from ._version import __version__
and for .bumpversion.cfg
[bumpversion]
current_version = 0.0.1
commit = False
tag = False
parse = (?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)(\-(?P<release>[a-z]+)(?P<build>\d+))?
serialize =
{major}.{minor}.{patch}-{release}{build}
{major}.{minor}.{patch}
[bumpversion:part:release]
optional_value = prod
first_value = dev
values =
dev
prod
[bumpversion:file:packagex/_version.py]
Using setuptools and pbr
There is not a standard way to manage version, but the standard way to manage your packages is setuptools.
The best solution I've found overall for managing version is to use setuptools with the pbr extension. This is now my standard way of managing version.
Setting up your project for full packaging may be overkill for simple projects, but if you need to manage version, you are probably at the right level to just set everything up. Doing so also makes your package releasable at PyPi so everyone can download and use it with Pip.
PBR moves most metadata out of the setup.py tools and into a setup.cfg file that is then used as a source for most metadata, which can include version. This allows the metadata to be packaged into an executable using something like pyinstaller if needed (if so, you will probably need this info), and separates the metadata from the other package management/setup scripts. You can directly update the version string in setup.cfg manually, and it will be pulled into the *.egg-info folder when building your package releases. Your scripts can then access the version from the metadata using various methods (these processes are outlined in sections below).
When using Git for VCS/SCM, this setup is even better, as it will pull in a lot of the metadata from Git so that your repo can be your primary source of truth for some of the metadata, including version, authors, changelogs, etc. For version specifically, it will create a version string for the current commit based on git tags in the repo.
PyPA - Packaging Python Packages with SetupTools - Tutorial
PBR latest build usage documentation - How to setup an 8-line setup.py and a setup.cfg file with the metadata.
As PBR will pull version, author, changelog and other info directly from your git repo, so some of the metadata in setup.cfg can be left out and auto generated whenever a distribution is created for your package (using setup.py)
Get the current version in real-time
setuptools will pull the latest info in real-time using setup.py:
python setup.py --version
This will pull the latest version either from the setup.cfg file, or from the git repo, based on the latest commit that was made and tags that exist in the repo. This command doesn't update the version in a distribution though.
Updating the version metadata
When you create a distribution with setup.py (i.e. py setup.py sdist, for example), then all the current info will be extracted and stored in the distribution. This essentially runs the setup.py --version command and then stores that version info into the package.egg-info folder in a set of files that store distribution metadata.
Note on process to update version meta-data:
If you are not using pbr to pull version data from git, then just update your setup.cfg directly with new version info (easy enough, but make sure this is a standard part of your release process).
If you are using git, and you don't need to create a source or binary distribution (using python setup.py sdist or one of the python setup.py bdist_xxx commands) the simplest way to update the git repo info into your <mypackage>.egg-info metadata folder is to just run the python setup.py install command. This will run all the PBR functions related to pulling metadata from the git repo and update your local .egg-info folder, install script executables for any entry-points you have defined, and other functions you can see from the output when you run this command.
Note that the .egg-info folder is generally excluded from being stored in the git repo itself in standard Python .gitignore files (such as from Gitignore.IO), as it can be generated from your source. If it is excluded, make sure you have a standard "release process" to get the metadata updated locally before release, and any package you upload to PyPi.org or otherwise distribute must include this data to have the correct version. If you want the Git repo to contain this info, you can exclude specific files from being ignored (i.e. add !*.egg-info/PKG_INFO to .gitignore)
Accessing the version from a script
You can access the metadata from the current build within Python scripts in the package itself. For version, for example, there are several ways to do this I have found so far:
## This one is a new built-in as of Python 3.8.0 should become the standard
from importlib.metadata import version
v0 = version("mypackage")
print('v0 {}'.format(v0))
## I don't like this one because the version method is hidden
import pkg_resources # part of setuptools
v1 = pkg_resources.require("mypackage")[0].version
print('v1 {}'.format(v1))
# Probably best for pre v3.8.0 - the output without .version is just a longer string with
# both the package name, a space, and the version string
import pkg_resources # part of setuptools
v2 = pkg_resources.get_distribution('mypackage').version
print('v2 {}'.format(v2))
## This one seems to be slower, and with pyinstaller makes the exe a lot bigger
from pbr.version import VersionInfo
v3 = VersionInfo('mypackage').release_string()
print('v3 {}'.format(v3))
You can put one of these directly in your __init__.py for the package to extract the version info as follows, similar to some other answers:
__all__ = (
'__version__',
'my_package_name'
)
import pkg_resources # part of setuptools
__version__ = pkg_resources.get_distribution("mypackage").version
Create a file named by _version.txt in the same folder as __init__.py and write version as a single line:
0.8.2
Read this infomation from file _version.txt in __init__.py:
import os
def get_version():
with open(os.path.join(os.path.abspath(os.path.dirname(__file__)), "_version.txt")) as f:
return f.read().strip()
__version__ = get_version()
I described a standard and modern way here, relying on setuptools_scm.
This pattern has worked successfully for dozens of published packages over the past years, so I can warmly recommend it.
Note that you do not need the getversion package to implement this pattern. It just happens that the getversion documentation hosts this tip.
I prefer to read the package version from installation environment.
This is my src/foo/_version.py:
from pkg_resources import get_distribution
__version__ = get_distribution('foo').version
Makesure foo is always already installed, that's why a src/ layer is required to prevent foo imported without installation.
In the setup.py, I use setuptools-scm to generate the version automatically.
Update in 2022.7.5:
There is another way, which is my faviourate now. Use setuptools-scm to generate a _version.py file.
setup(
...
use_scm_version={
'write_to':
'src/foo/_version.py',
'write_to_template':
'"""Generated version file."""\n'
'__version__ = "{version}"\n',
},
)
Using setuptools and pyproject.toml
Setuptools now offers a way to dynamically get version in pyproject.toml
Reproducing the example here, you can create something like the following in your pyproject.toml
# ...
[project]
name = "my_package"
dynamic = ["version"]
# ...
[tool.setuptools.dynamic]
version = {attr = "my_package.__version__"}
Use a version.py file only with __version__ = <VERSION> param in the file. In the setup.py file import the __version__ param and put it's value in the setup.py file like this:
version=__version__
Another way is to use just a setup.py file with version=<CURRENT_VERSION> - the CURRENT_VERSION is hardcoded.
Since we don't want to manually change the version in the file every time we create a new tag (ready to release a new package version), we can use the following..
I highly recommend bumpversion package. I've been using it for years to bump a version.
start by adding version=<VERSION> to your setup.py file if you don't have it already.
You should use a short script like this every time you bump a version:
bumpversion (patch|minor|major) - choose only one option
git push
git push --tags
Then add one file per repo called: .bumpversion.cfg:
[bumpversion]
current_version = <CURRENT_TAG>
commit = True
tag = True
tag_name = {new_version}
[bumpversion:file:<RELATIVE_PATH_TO_SETUP_FILE>]
Note:
You can use __version__ parameter under version.py file like it was suggested in other posts and update the bumpversion file like this:
[bumpversion:file:<RELATIVE_PATH_TO_VERSION_FILE>]
You must git commit or git reset everything in your repo, otherwise you'll get a dirty repo error.
Make sure that your virtual environment includes the package of bumpversion, without it it will not work.
For what it's worth, if you're using NumPy distutils, numpy.distutils.misc_util.Configuration has a make_svn_version_py() method that embeds the revision number inside package.__svn_version__ in the variable version .
If you use CVS (or RCS) and want a quick solution, you can use:
__version__ = "$Revision: 1.1 $"[11:-2]
__version_info__ = tuple([int(s) for s in __version__.split(".")])
(Of course, the revision number will be substituted for you by CVS.)
This gives you a print-friendly version and a version info that you can use to check that the module you are importing has at least the expected version:
import my_module
assert my_module.__version_info__ >= (1, 1)