I'm writing a couple of packages that I'd like to release on PyPi for other people to use.
I've not released to PyPi before so I have been mocking up a submission template: https://github.com/chris-brown-nz/pypi-package-template
Here's a tree of the project template:
| MANIFEST.in
| README.rst
| setup.cfg
| setup.py
|
\---package
module_one.py
module_three.py
module_two.py
__init__.py
In terms of interacting with the package, this is what I would usually do - is it the best way?
To run a method:
from package import module_one
module_one.ClassOne().method_a()
To get a value from a method:
from package import module_two
print(module_two.ClassFive().method_e())
To set then use an attribute of an instance:
from package import module_three
cls = module_three.ClassSeven("Hello World")
print(cls.value)
'package' is a reserved name obviously and won't be used in the final project.
I'd be grateful for some feedback on how I've structured my project and whether it is considered standard, or if it should be modified in some way.
There are different approaches to this, whether one or the other is better is depending on a how you want to develop, usage of the package (e.g. if you ever install it using pip install -e packag_name), etc.
What is missing from your tree is the name of the directory where the setup.py resides, and that is usually the package name:
└── package
├── package
│ ├── __init__.py
│ ├── module_one.py
│ ├── module_three.py
│ └── module_two.py
├── MANIFEST.in
├── README.rst
├── setup.cfg
└── setup.py
as you can see you are doubling the 'package' name, and that means that your setup.py has to be adapted for each package, or dynamically determine the name of the directory where the module.py files resides. If you go for this route, I would suggest you put the module.py files in a generically named directory 'src' or 'lib'.
I don't like the above "standard" setup for multiple reasons:
it doesn't map well to how python programs "grow" before they are split up into packages. Before splitting up having such a 'src' directory would mean using:
from package.src.module_one import MyModuleOneClass
Instead you would have your module.py files directly under package
Having a setup.py to control installation, a README.rst for documentation and an __init__.py to satisfy Python's import is one thing, but all other stuff, apart from your module.py files containing the actual functionality, is garbage. Garbage that might be needed at some point during the package creation process, but is not necessary for the package functionality.
There are other considerations, such as being able to access the version number of the package from the setup.py as well as from the program, without the former having to import the package itself (which may lead to install complications), nor having another extra version.py file that needs importing.
In particular I always found the transition from using a directory structure under site-packages that looked like:
└── organisation
├── package1
└── package2
├── subpack1
└── subpack2
and that could intuitively be used for both importing and navigation to source files, to something like:
├── organisation_package1
│ └── src
├── organisation_package2_subpack1
│ └── src
└── organisation_package2_subpack2
└── src
unnatural. To rearrange and break a working structure to be able to package things seems wrong.
For my set of published packages I followed another way:
- I kept the natural tree structure that you can use "before packaging", 'src' or 'lib' directories.
- I have a generic setup.py which reads and parses (it does not import) the metadata (such as version number, package name, license information, whether to install a utility (and its name)), from a dictionary in the __init__.py file. A file you need anyway.
- The setup.py is smart enough to distinguish subdirectories containing other packages from subdirectories that are part of the parent package.
- setup.py generates files that are needed during package generation only (like setup.cfg), on the fly, and deletes them when no longer needed.
The above allows you to have nested namespaced packages (i.e. package2 can be a package you upload to PyPI, in addition to package2.subpack1 and package2.subpack2). The major thing it (currently) doesn't allow is using pip install -e to edit a single package (and not have the others editable). Given the way I develop, that is not a restriction.
The above embraces namespace packages, where many other approaches have problems with these (remember the last line of Zen of Python: Namespaces are one honking great idea – let’s do more of those)
Examples of the above can e.g be found in my packages ruamel.yaml (and e.g. ruamel.yaml.cmd), or generically by searching PyPI for ruamel.
As is probably obvious, the standard disclaimer: I am the author of those packages
As I use a utility to start packaging, which also runs the tests and does other sanity checks, the generic setup.py could be removed from the setup and inserted by that utility as well. But since subpackage detection is based upon setup.py availability or not, this requires some rework of the generic setup.py.
Related
I am not very familiar with python, I only done automation with so I am a new with packages and everything.
I am creating an API with Flask, Gunicorn and Poetry.
I noticed that there is a version number inside the pyproject.toml and I would like to create a route /version which returns the version of my app.
My app structure look like this atm:
├── README.md
├── __init__.py
├── poetry.lock
├── pyproject.toml
├── tests
│ └── __init__.py
└── wsgi.py
Where wsgi.py is my main file which run the app.
I saw peoples using importlib but I didn't find how to make it work as it is used with:
__version__ = importlib.metadata.version("__package__")
But I have no clue what this package mean.
You should not use __package__, which is the name of the "import package" (or maybe import module, depending on where this line of code is located), and this is not what importlib.metadata.version() expects. This function expects the name of the distribution package (the thing that you pip-install), which is the one you write in pyproject.toml as name = "???".
You can extract version from pyproject.toml using toml package to read toml file and then display it in a webpage.
Here is my tree (simplified):
└── internal_models
├── models
│ ├── __init__.py
│ └── api
│ ├── my_code.py
└── setup.py
And my setup.py:
from setuptools import setup
setup(name='internal-models',
version='0.0.2',
description='models package',
packages=["models"],
zip_safe=False,
install_requires=[])
When I install with pip install . or python setup.py build, Setuptools installs internal-models (which cannot be imported anyway because of an illegal character), instead of the package I want, models. What am I doing wrong? Have read the setuptools Quickstart and various related questions but am still confused.
Minimal reproducible example
So this problem was only due to my misunderstanding, but I thought I'd clarify it in an answer since there's a distinction here that wasn't so clear (to me) from the Setuptools documentation.
The distribution name defined in setup.py/setup.cfg is the overall package name, which will be saved in your environment's site-packages directory, and is output by pip freeze. It is not importable. Valid names are defined in PEP 508. They may, for example, contain a dash, which in import packages is, while not illegal, discouraged by PEP 8. (It can't be imported in the standard way, since Python interprets the dash as a minus sign.)
The import packages (or modules) defined in setup.py/setup.cfg are what you can import in Python. So in my case, internals-models was being installed, but the way I would use the models package is through import models (the behaviour I wanted).
In the last few days, I was working on a python module. Until now, I used poetry as a packages management tool in many other projects, but it is my first time wanting to publish a package to PyPI.
I was able to run the poetry build and poetry publish commands. I was also able to also install the published package:
$ pip3 install git-profiles
Collecting git-profiles
Using cached https://files.pythonhosted.org/packages/0e/e7/bac9027effd1e34a5b5718f2b35c0b28b3d67f3809e2f2981b6c7b58963e/git_profiles-1.1.0-py3-none-any.whl
Installing collected packages: git-profiles
Successfully installed git-profiles-1.1.0
However, right after the install, I am not able to run my package:
$ git-profiles --help
git-profiles: command not found
My project has the following structure:
git-profiles/
├── src/
│ ├── commands/
│ ├── executor/
│ ├── git_manager/
│ ├── profile/
│ ├── utils/
│ ├── __init__.py
│ └── git_profiles.py
└── tests
I tried to work with different scripts configurations in the pyproject.toml file but I've never been able to make it work after install.
[tool.poetry.scripts]
poetry = "src:git_profiles.py"
or
[tool.poetry.scripts]
git-profile = "src:git_profiles.py"
I don't know if this is a python/pip path/version problem or I need to change something in the configuration file.
If it is helpful, this is the GitHub repository I'm talking about. The package is also published on PyPI.
Poetry's scripts sections wraps around the console script definition of setuptools. As such, the entrypoint name and the call path you give it need to follow the exact same rules.
In short, a console script does more or less this from the shell:
import my_lib # the module isn't called src, that's just a folder name
# the right name to import is whatever you put at [tool.poetry].name
my_lib.my_module.function()
Which, if given the name my-lib-call (the name can be the same as your module, but it doesn't need to be) would be written like this:
[tool.poetry.scripts]
my-lib-call = "my_lib.my_module:function"
Adapted to your project structure, the following should do the job:
[tool.poetry.scripts]
git-profile = "git-profiles:main"
I realize there are a slew of posts on SO related to Python and imports, but it seems like a fair number of these posts are asking about import rules/procedures with respect to creating an actual Python package (vs just a project with multiple directories and python files). I am very new to Python and just need some more basic clarification on what is and is not possible with regard to access/importing within the context of multiple py files in a project directory.
Let's say you have the following project directory (to be clear, this is not a package that is somewhere on sys.path, but say, on your Desktop):
myProject/
├── __init__.py
├── scriptA.py
└── subfolder
├── __init__.py
└── scriptB.py
└── subsubfolder
├── __init__.py
└── scriptC.py
└── foo.py
Am I correct in understanding that the only way scriptC.py could import and use methods or classes within scriptB.py if scriptC.py is run directly via $ python scriptC.py and from within the subsubfolder directory is if I add the parent directory and path to scriptB.py to the Python path at runtime via sys.path ?
It is possible, however, for scriptC.py to import foo.py or for scriptB.py to import scriptC.py or foo.py without dealing with sys.path, correct? Adjacent py files and py files in subdirectories are accessible just by using relative import paths, you just can't import python scripts that live in parent or sibling directories (without using sys.path) ?
What's Possible
Anything.
No, really. See the imp module, the the imputil module -- take a look at how the zipimport module is written if you want some inspiration.
If you can get a string with your module's code in a variable, you can get a module into sys.modules using the above, and perhaps hack around with its contents using the ast module on the way.
A custom import hook that looks in parent directories? Well within the range of possibilities.
What's Best Practice
What you're proposing isn't actually good practice. The best-practice approach looks more like the following:
myProject/
├── setup.py
└── src/
├── moduleA.py
└── submodule/
├── __init__.py
├── moduleB.py
└── subsubmodule/
├── __init__.py
└── moduleC.py
Here, the top of your project is always in myProject/src. If you use setup.py to configure moduleA:main, submodule.moduleB:main and submodule.subsubmodule.moduleC:main as entry points (perhaps named scriptA, scriptB and scriptC), then the functions named main in each of those modules would be invoked when the user ran the (automatically generated by setuptools) scripts so named.
With this layout (and appropriate setuptools use), your moduleC.py can absolutely import moduleA, or import submodule.moduleB.
Another approach, which doesn't involve entrypoints, to invoke the code in your moduleC.py (while keeping the module's intended hierarchy intact, and assuming you're in a virtualenv where python setup.py develop has been run) like so:
python -m submodule.subsubmodule.moduleC
I have a module that sits in a namespace. Should tests and data the tests rely on go in the namespace or in the top level where setup.py sites?
./company/__init__.py
./company/namespace/__init__.py
./company/namespace/useful.py
./company/namespace/test_useful.py
./company/namespace/test_data/useful_data.xml
./setup.py
or
./company/__init__.py
./company/namespace/__init__.py
./company/namespace/useful.py
./test_useful.py
./test_data/useful_data.xml
./setup.py
Does the question amount to whether tests should be installed or not?
The Sample Project stores the tests outside the module.
The directory structure looks like this:
├── data
│ └── data_file
├── MANIFEST.in
├── README.rst
├── sample
│ ├── __init__.py
│ └── package_data.dat
├── setup.cfg
├── setup.py
└── tests
├── __init__.py
└── test_simple.py
Related: The Packing Guide: https://packaging.python.org/en/latest/
Hint: Don't follow the "The Hitchhiker's Guide to Packaging". It has not been updated since 2010!
(do not confuse both pages. The "The Hitchhiker’s Guide to Python" is a very solid book)
You should put your test module inside the module it tests according to The Hitchhiker's Guide to Packaging.
Here is their example:
TowelStuff/
bin/
CHANGES.txt
docs/
LICENSE.txt
MANIFEST.in
README.txt
setup.py
towelstuff/
__init__.py
location.py
utils.py
test/
__init__.py
test_location.py
test_utils.py
This way your module will be distributed with its tests and users can use them to verify that it works with their set up.
See http://the-hitchhikers-guide-to-packaging.readthedocs.org/en/latest/creation.html.
I personally create a single tests package as a sub package of the main package for a few reasons:
If tests is in parallel with the root package there's an off chance you, or a user may misconfigure setup.py and accidentally expose a global package named tests that will cause a great deal of confusion and headache until you realize what has happened. Putting it in the main module solves this as it's now under a (hopefully) globally unique namespace.
I don't like putting a test module within user package because test runners have to search through production code. This is probably not a problem for most. But, if you happen to be a hardware test engineer, you probably use the word 'test' a lot in your production code and don't want the unit test runner to pick that stuff up. It's much easier if all the tests are in one place separate from the production code.
I can further subdivide my tests folder into the types of tests, such as unit, functional and integration. My functional tests tend to have dependencies on weird proprietary hardware, data or are slow. So it's easy for me to continuously run just the fast unit test folder as I develop.
It can sometimes be convenient to have the tests be inside of the same package hierarchy as what it is testing.
Overall though, I think it's important to think for yourself about what's best for your particular problem domain after taking everyone's advice into account. 'Best practices' are great starting points, not end points, for developing a process.