Python: Distributing a file/module into another package with setuptools - python

Generically, I'm trying to distribute a module/file that I want to be part of a another python package that has already set up a directory structure under site-packages.
For the given structure under site-packages:
-- main_package
-- __init__.py
-- sub_package
-- __init__.py
-- bar.py
I have a module foo.py that I want to exist next to the module bar.py, both under the package sub_package. If I just setup the structure in my repo along the lines of a typical package then the __init__.py that setuptools requires clobber the existing __init__.py files for the main_package and sub_package. Ideally this would be a wheel or sdist that could be distributed from our own pypi.
I've tried using py_module, but I can't get it to place the module anywhere other than the top level under site-packages.
Specifically, I'm trying to distribute a saltstack external_pillar into the salt/pillar/ structure already setup by salt. There are multiple salt instances that may or may not want the external_pillar, so simply bundling up the pillar into the salt distribution we have isn't feasible.

You may be looking for namespace packages.
With in your case a structure like this:
`-- main_package
| # no __init__.py here
`-- sub_package
|-- __init__.py
`-- foo.py
Python 3.3 added implicit namespace packages from PEP 420. All that is required to create a native namespace package is that you just omit __init__.py from the namespace package directory.
...
It is extremely important that every distribution that uses the namespace package omits the __init__.py or uses a pkgutil-style __init__.py. If any distribution does not, it will cause the namespace logic to fail and the other sub-packages will not be importable.
...
Finally, every distribution must provide the namespace_packages argument to setup() in setup.py.
However, from your question I am not sure whether sub_package already exists in main_package or not.
If it does, I am not sure of the structure to use to extend sub_package while not shadowing its modules, you may need to use an alternate sub-package name to achieve this.

Related

python namespaces vs packages: making a package the default namespace

I have a project with an overarching namespace, with packages inside it. Here's the folder structure:
pypackage
├── pypackage <-- Source code for use in this project.
| |
│ ├── bin <-- Module: Cli entry point into pyproject.
| | ├── __init__.py
| | └── pypackage.py
| |
| └── core <-- Module: Core functionality.
| ├── __init__.py
| └── pypackage.py
|
├── tests
├── README.md
└── setup.py
Pretty simple. If I want to import it I use:
from pypackage.core import pypackage
and it works great because my setup.py looks like this:
from setuptools import setup, find_packages
...
NAME = 'pypackage'
setup(
name=NAME,
namespace_packages=[NAME],
packages=[f'{NAME}.{p}' for p in find_packages(where=NAME)],
entry_points={
"console_scripts": [
f'{NAME} = {NAME}.bin.{NAME}:cli',
]
},
...
)
However, I have legacy code that imports this pypackage when it used to just be a stand alone python file. like this:
import pypackage
So how do I make it so I can keep the same structure with namespaces and subpackages but still import it the old way? How do I turn this:
from pypackage.core import pypackage
into this:
import pypackage
In other words, how do I alias the pypackage.core.pypackage module to be pypackage for when I'm importing pypackage into an external project?
You would add the 'old' names inside your new package by importing into the top-level package.
Names imported as globals in pypackage/__init__.py are attributes on the pypackage package. Make use of that to give access to 'legacy' locations:
# add all public names from pypackage.core.pypackage to the top level for
# legacy package use
from .core.pypackage import *
Now any code that uses import pypackage can use pypackage.foo and pypackage.bar if in reality these objects were defined in pypackage.core.pypackage instead.
Now, because pypackage is a setuptools namespace package you have a different problem; namespace packages are there for multiple separate distributions to install into so that top-level package must either be empty or only contain a minimum __init__.py file (namespace packages created with empty directories require Python 3.3).
If you are the only publisher of distributions that use this namespace, you can cheat a little here and use a single __init__.py file in your core package that could use pkg-util-style __init__.py file with the additional import I used above, but then you must not use any __init__.py files in other distribution packages or require that they all use the exact same __init__.py content. Coordination is key here.
Or you would have to use a different approach. Leave pypackage as a legacy wrapper module, and rename the new package format to use a new, different top-level name that can live next to the old module. At this point you can then just include the legacy package in your project, directly, as an extra top-level module.
Martin Pieters has the right idea if I were using packages, but a namespace package is a setuptools thing.
So that didn't work. after more research, I learned that there's no way to do what I'm trying to do. So if I really want to do it I must convert everything to a regular package hierarchy instead of namespace package, then use martin's solution.
I've decided to modify the legacy code instead to import it the new way.

Template for Python Package Index (PyPi) submission

I'm writing a couple of packages that I'd like to release on PyPi for other people to use.
I've not released to PyPi before so I have been mocking up a submission template: https://github.com/chris-brown-nz/pypi-package-template
Here's a tree of the project template:
| MANIFEST.in
| README.rst
| setup.cfg
| setup.py
|
\---package
module_one.py
module_three.py
module_two.py
__init__.py
In terms of interacting with the package, this is what I would usually do - is it the best way?
To run a method:
from package import module_one
module_one.ClassOne().method_a()
To get a value from a method:
from package import module_two
print(module_two.ClassFive().method_e())
To set then use an attribute of an instance:
from package import module_three
cls = module_three.ClassSeven("Hello World")
print(cls.value)
'package' is a reserved name obviously and won't be used in the final project.
I'd be grateful for some feedback on how I've structured my project and whether it is considered standard, or if it should be modified in some way.
There are different approaches to this, whether one or the other is better is depending on a how you want to develop, usage of the package (e.g. if you ever install it using pip install -e packag_name), etc.
What is missing from your tree is the name of the directory where the setup.py resides, and that is usually the package name:
└── package
├── package
│   ├── __init__.py
│   ├── module_one.py
│   ├── module_three.py
│   └── module_two.py
├── MANIFEST.in
├── README.rst
├── setup.cfg
└── setup.py
as you can see you are doubling the 'package' name, and that means that your setup.py has to be adapted for each package, or dynamically determine the name of the directory where the module.py files resides. If you go for this route, I would suggest you put the module.py files in a generically named directory 'src' or 'lib'.
I don't like the above "standard" setup for multiple reasons:
it doesn't map well to how python programs "grow" before they are split up into packages. Before splitting up having such a 'src' directory would mean using:
from package.src.module_one import MyModuleOneClass
Instead you would have your module.py files directly under package
Having a setup.py to control installation, a README.rst for documentation and an __init__.py to satisfy Python's import is one thing, but all other stuff, apart from your module.py files containing the actual functionality, is garbage. Garbage that might be needed at some point during the package creation process, but is not necessary for the package functionality.
There are other considerations, such as being able to access the version number of the package from the setup.py as well as from the program, without the former having to import the package itself (which may lead to install complications), nor having another extra version.py file that needs importing.
In particular I always found the transition from using a directory structure under site-packages that looked like:
└── organisation
├── package1
└── package2
├── subpack1
└── subpack2
and that could intuitively be used for both importing and navigation to source files, to something like:
├── organisation_package1
│   └── src
├── organisation_package2_subpack1
│   └── src
└── organisation_package2_subpack2
└── src
unnatural. To rearrange and break a working structure to be able to package things seems wrong.
For my set of published packages I followed another way:
- I kept the natural tree structure that you can use "before packaging", 'src' or 'lib' directories.
- I have a generic setup.py which reads and parses (it does not import) the metadata (such as version number, package name, license information, whether to install a utility (and its name)), from a dictionary in the __init__.py file. A file you need anyway.
- The setup.py is smart enough to distinguish subdirectories containing other packages from subdirectories that are part of the parent package.
- setup.py generates files that are needed during package generation only (like setup.cfg), on the fly, and deletes them when no longer needed.
The above allows you to have nested namespaced packages (i.e. package2 can be a package you upload to PyPI, in addition to package2.subpack1 and package2.subpack2). The major thing it (currently) doesn't allow is using pip install -e to edit a single package (and not have the others editable). Given the way I develop, that is not a restriction.
The above embraces namespace packages, where many other approaches have problems with these (remember the last line of Zen of Python: Namespaces are one honking great idea – let’s do more of those)
Examples of the above can e.g be found in my packages ruamel.yaml (and e.g. ruamel.yaml.cmd), or generically by searching PyPI for ruamel.
As is probably obvious, the standard disclaimer: I am the author of those packages
As I use a utility to start packaging, which also runs the tests and does other sanity checks, the generic setup.py could be removed from the setup and inserted by that utility as well. But since subpackage detection is based upon setup.py availability or not, this requires some rework of the generic setup.py.

How to add an external package to the same namespace as a local directory?

Initial directory structure: (assume __init__.py files in directories)
project/
▾ lib/
▸ dir/
▸ package
I need to reuse lib.package in other projects, hence I created a python package for it and removed the directory. But installing it now as lib.package I can't import it from root of the project as it has lib directory there leading to a namespace collision.
Final structure:
▾ project/
▾ lib/
▸ dir/
And a package named lib.package installed in the virtualenv.
▾ lib/
▸ package/
__init__.py
I looked into pkgutil.extendpath, but adding it to __init__.py of the lib.package python package didn't help. Are there any ways I can add both the local and virtualenv installed packages in the same namespace lib?
In Python 2.7 it's better to avoid this problem by choosing a name for the dependency that's unique or at least importable in your application's codebase without manipulating paths et cetera. I think the situation has improved in Python 3.4 but I'm not entirely sure here.
Furthermore, I had the same problem and decided to use unique names for another reason: when you wish to figure out where a module is from, it's quite hard when everything's namespaced under lib. It means the module can be from a dependency as well as the local application. It also means I can simply look at an import line and know, at all times, whether it's from my own application codebase, one of my own dependencies or an external dependency.

Accessing names defined in a package's `__init__.py` when running setuptools tests

I've taken to putting module code directly in a packages __init__.py, even for simple packages where this ends up being the only file.
So I have a bunch of packages that look like this (though they're not all called pants:)
+ pants/
\-- __init__.py
\-- setup.py
\-- README.txt
\--+ test/
\-- __init__.py
I started doing this because it allows me to put the code in a separate (and, critically, separately versionable) directory, and have it work in the same way as it would if the package were located in a single module.py. I keep these in my dev python lib directory, which I have added into $PYTHONPATH when working on such things. Each package is a separate git repo.
edit...
Compared to the typical Python package layout, as exemplified in Radomir's answer, this setup saves me from having to add each package's directory into my PYTHONPATH.
.../edit
This has worked out pretty well, but I've hit upon this (somewhat obscure) issue:
When running tests from within the package directory, the package itself, i.e. code in __init__.py, is not guaranteed to be on the sys.path. This is not a problem under my typical environment, but if someone downloads pants-4.6.tgz and extracts a tarball of the source distribution, cds into the directory, and runs python setup.py test, the package pants itself won't normally be in their sys.path.
I find this strange, because I would expect setuptools to run the tests from a parent directory of the package under test. However, for whatever reason, it doesn't do that, I guess because normally you wouldn't package things this way.
Relative imports don't work because test is a top-level package, having been found as a subdirectory of the current-directory component of sys.path.
I'd like to avoid having to move the code into a separate file and importing its public names into __init__.py. Mostly because that seems like pointless clutter for a simple module.
I could explicitly add the parent directory to sys.path from within setup.py, but would prefer not to. For one thing, this could, at least in theory, fail, e.g. if somebody decides to run the test from the root of their filesystem (presumably a Windows drive). But mostly it just feels jerry-rigged.
Is there a better way?
Is it considered particularly bad form to put code in __init__.py?
I think the standard way to package python programs would be more like this:
\-- setup.py
\-- README.txt
\--+ pants/
\-- __init__.py
\-- __main__.py
...
\--+ tests/
\-- __init__.py
...
\--+ some_dependency_you_need/
...
Then you avoid the problem.

How do I create a namespace package in Python?

In Python, a namespace package allows you to spread Python code among several projects. This is useful when you want to release related libraries as separate downloads. For example, with the directories Package-1 and Package-2 in PYTHONPATH,
Package-1/namespace/__init__.py
Package-1/namespace/module1/__init__.py
Package-2/namespace/__init__.py
Package-2/namespace/module2/__init__.py
the end-user can import namespace.module1 and import namespace.module2.
What's the best way to define a namespace package so more than one Python product can define modules in that namespace?
TL;DR:
On Python 3.3 you don't have to do anything, just don't put any __init__.py in your namespace package directories and it will just work. On pre-3.3, choose the pkgutil.extend_path() solution over the pkg_resources.declare_namespace() one, because it's future-proof and already compatible with implicit namespace packages.
Python 3.3 introduces implicit namespace packages, see PEP 420.
This means there are now three types of object that can be created by an import foo:
A module represented by a foo.py file
A regular package, represented by a directory foo containing an __init__.py file
A namespace package, represented by one or more directories foo without any __init__.py files
Packages are modules too, but here I mean "non-package module" when I say "module".
First it scans sys.path for a module or regular package. If it succeeds, it stops searching and creates and initalizes the module or package. If it found no module or regular package, but it found at least one directory, it creates and initializes a namespace package.
Modules and regular packages have __file__ set to the .py file they were created from. Regular and namespace packages have __path__set to the directory or directories they were created from.
When you do import foo.bar, the above search happens first for foo, then if a package was found, the search for bar is done with foo.__path__as the search path instead of sys.path. If foo.bar is found, foo and foo.bar are created and initialized.
So how do regular packages and namespace packages mix? Normally they don't, but the old pkgutil explicit namespace package method has been extended to include implicit namespace packages.
If you have an existing regular package that has an __init__.py like this:
from pkgutil import extend_path
__path__ = extend_path(__path__, __name__)
... the legacy behavior is to add any other regular packages on the searched path to its __path__. But in Python 3.3, it also adds namespace packages.
So you can have the following directory structure:
├── path1
│   └── package
│   ├── __init__.py
│   └── foo.py
├── path2
│   └── package
│   └── bar.py
└── path3
└── package
├── __init__.py
└── baz.py
... and as long as the two __init__.py have the extend_path lines (and path1, path2 and path3 are in your sys.path) import package.foo, import package.bar and import package.baz will all work.
pkg_resources.declare_namespace(__name__) has not been updated to include implicit namespace packages.
There's a standard module, called pkgutil, with which you
can 'append' modules to a given namespace.
With the directory structure you've provided:
Package-1/namespace/__init__.py
Package-1/namespace/module1/__init__.py
Package-2/namespace/__init__.py
Package-2/namespace/module2/__init__.py
You should put those two lines in both Package-1/namespace/__init__.py and Package-2/namespace/__init__.py (*):
from pkgutil import extend_path
__path__ = extend_path(__path__, __name__)
(* since -unless you state a dependency between them- you don't know which of them will be recognized first - see PEP 420 for more information)
As the documentation says:
This will add to the package's __path__ all subdirectories of directories on sys.path named after the package.
From now on, you should be able to distribute those two packages independently.
This section should be pretty self-explanatory.
In short, put the namespace code in __init__.py, update setup.py to declare a namespace, and you are free to go.
This is an old question, but someone recently commented on my blog that my posting about namespace packages was still relevant, so thought I would link to it here as it provides a practical example of how to make it go:
https://web.archive.org/web/20150425043954/http://cdent.tumblr.com/post/216241761/python-namespace-packages-for-tiddlyweb
That links to this article for the main guts of what's going on:
http://www.siafoo.net/article/77#multiple-distributions-one-virtual-package
The __import__("pkg_resources").declare_namespace(__name__) trick is pretty much drives the management of plugins in TiddlyWeb and thus far seems to be working out.
You have your Python namespace concepts back to front, it is not possible in python to put packages into modules. Packages contain modules not the other way around.
A Python package is simply a folder containing a __init__.py file. A module is any other file in a package (or directly on the PYTHONPATH) that has a .py extension. So in your example you have two packages but no modules defined. If you consider that a package is a file system folder and a module is file then you see why packages contain modules and not the other way around.
So in your example assuming Package-1 and Package-2 are folders on the file system that you have put on the Python path you can have the following:
Package-1/
namespace/
__init__.py
module1.py
Package-2/
namespace/
__init__.py
module2.py
You now have one package namespace with two modules module1 and module2. and unless you have a good reason you should probably put the modules in the folder and have only that on the python path like below:
Package-1/
namespace/
__init__.py
module1.py
module2.py

Categories