How can I make pip install package data (a config file)?

How can I make pip install package data (a config file)? - python

I have a package called clana (Github, PyPI) with the following structure:
.
├── clana
│   ├── cli.py
│   ├── config.yaml
│   ├── __init__.py
│   ├── utils.py
│   └── visualize_predictions.py
├── docs/
├── setup.cfg
├── setup.py
├── tests/
└── tox.ini
The setup.py looks like this:
from setuptools import find_packages
from setuptools import setup
requires_tests = [...]
install_requires = [...]
config = {
"name": "clana",
"version": "0.3.6",
"author": "Martin Thoma",
"author_email": "info#martin-thoma.de",
"maintainer": "Martin Thoma",
"maintainer_email": "info#martin-thoma.de",
"packages": find_packages(),
"entry_points": {"console_scripts": ["clana=clana.cli:entry_point"]},
"install_requires": install_requires,
"tests_require": requires_tests,
"package_data": {"clana": ["clana/config.yaml"]},
"include_package_data": True,
"zip_safe": False,
}
setup(**config)
How to check that it didn't work
Quick
python3 setup.py sdist
open dist/clana-0.3.8.tar.gz # config.yaml is not in this file
The real check
I thought this would make sure that the config.yaml is in the same directory as the cli.py when the package is installed. But when I try this:
virtualenv venv
source venv/bin/activate
pip install clana
cd venv/lib/python3.6/site-packages/clana
ls
I get:
cli.py __init__.py __pycache__ utils.py visualize_predictions.py
The way I upload it to PyPI:
python3 setup.py sdist bdist_wheel && twine upload dist/*
So the config.yaml is missing. How can I make sure it is there?

You can add a file name MANIFEST.in next to setup.py with a list of the file you want to add, wildcard allowed (ex: include *.yaml or include clana/config.yaml)
then the option include_package_data=True will activate the manifest file

In short: add config.yaml to MANIFEST.in, and set include_package_data. One without the other is not enough.
Basically it goes like this:
MANIFEST.in adds files to sdist (source distribution).
include_package_data adds these same files to bdist (built distribution), i.e. it extends the effect of MANIFEST.in to bdist.
exclude_package_data prevents files in sdist to be added to bdist, i.e. it filters the effect of include_package_data.
package_data adds files to bdist, i.e. it adds build artifacts (typically the products of custom build steps) to your bdist and has of course no effect on sdist.
So in your case, the file config.yaml is not installed, because it is not added to your bdist (built distribution). There are 2 ways to fix this depending on where the file comes from:
either the file is a build artifact (typically it is somehow created during the ./setup.py build phase), then you need to add it to package_data ;
or the file is part of your source (typically it is in your source code repository), then you need to add it to MANIFEST.in, set include_package_data, and leave it out of exclude_package_data (this seems to be your case here).
See:
https://stackoverflow.com/a/54953494/11138259
https://setuptools.readthedocs.io/en/latest/setuptools.html#including-data-files

Following from the documentation on including data files, if your package has data files such as .yaml files, you may include them like so:
setup(
...
package_data={
"": ["*.yaml"],
},
...
)
This will allow any file in your package with the file extension .yaml to be included.

Related

Python package published with poetry is not found after install

In the last few days, I was working on a python module. Until now, I used poetry as a packages management tool in many other projects, but it is my first time wanting to publish a package to PyPI.
I was able to run the poetry build and poetry publish commands. I was also able to also install the published package:
$ pip3 install git-profiles
Collecting git-profiles
Using cached https://files.pythonhosted.org/packages/0e/e7/bac9027effd1e34a5b5718f2b35c0b28b3d67f3809e2f2981b6c7b58963e/git_profiles-1.1.0-py3-none-any.whl
Installing collected packages: git-profiles
Successfully installed git-profiles-1.1.0
However, right after the install, I am not able to run my package:
$ git-profiles --help
git-profiles: command not found
My project has the following structure:
git-profiles/
├── src/
│   ├── commands/
│   ├── executor/
│   ├── git_manager/
│   ├── profile/
│   ├── utils/
│   ├── __init__.py
│   └── git_profiles.py
└── tests
I tried to work with different scripts configurations in the pyproject.toml file but I've never been able to make it work after install.
[tool.poetry.scripts]
poetry = "src:git_profiles.py"
or
[tool.poetry.scripts]
git-profile = "src:git_profiles.py"
I don't know if this is a python/pip path/version problem or I need to change something in the configuration file.
If it is helpful, this is the GitHub repository I'm talking about. The package is also published on PyPI.

Poetry's scripts sections wraps around the console script definition of setuptools. As such, the entrypoint name and the call path you give it need to follow the exact same rules.
In short, a console script does more or less this from the shell:
import my_lib # the module isn't called src, that's just a folder name
# the right name to import is whatever you put at [tool.poetry].name
my_lib.my_module.function()
Which, if given the name my-lib-call (the name can be the same as your module, but it doesn't need to be) would be written like this:
[tool.poetry.scripts]
my-lib-call = "my_lib.my_module:function"
Adapted to your project structure, the following should do the job:
[tool.poetry.scripts]
git-profile = "git-profiles:main"

How to tell pbr to include non-code files in package

I only run into this problem when I build in a python:alpine image. Reproducing it is a bit of a pain, but these are the steps:
Docker container setup:
$ docker run -it python:3.7-rc-alpine /bin/ash
$ pip install pbr
Small package setup, including non-python files
test
├── .git
├── setup.cfg
├── setup.py
└── src
   └── test
   ├── __init__.py
├── test.yml
   └── sub_test
      ├── __init__.py
      └── test.yml
setup.py:
from setuptools import find_packages, setup
setup(
setup_requires=['pbr'],
pbr=True,
package_dir={'': 'src'},
packages=find_packages(where='src'),
)
setup.cfg:
[metadata]
name = test
All other files are empty. I copy them to the container with docker cp test <docker_container>:/test.
Back in the container I now try to build the package with cd test; pip wheel -w wheel ., the test.yml in test/src/test will be included in it, but the one in test/src/test/sub_test won't.
I have no clue why this happens, since the (pitifully sparse, and imo quite confusing) documentation of pbr on that matter states that
Just like AUTHORS and ChangeLog, why keep a list of files you wish to include when you can find many of these in git. MANIFEST.in generation ensures almost all files stored in git, with the exception of .gitignore, .gitreview and .pyc files, are automatically included in your distribution.
I could not find a pbr-parameter that lets me explicitly include some file or file type, which I expected to exist.
Creating a MANIFEST.in with import src/test/sub_test.test.yml actually solves this problem, but I'd rather understand and avoid this behavior all together instead.

pbr needs git in order to correctly compile its files-to-include list, so the problem can be solved by installing git into the build environment before building the package. With an alpine image, that would be apk add --no-cache git.
pbr uses the .git file to figure out which files should be part of the package and which ones shouldn't. The short version is that it grabs the intersection of the file list from the packages parameter in the setup-call in setup.py and everything that is commited or staged in the currently checked out git branch.
So if the project came with no .git file, you'd need to additionally execute git init; git add src as well.
The reason for the 'bug' is that pbr silently assumes that all .py files should be added regardles of them being commited or not, which makes the actual problem harder to identify. It will also only throw an error if it can't find a .git file, and not if it is there but it can't get any info from it because git isn't installed.

How to package sub-folder for gcloud ml?

I am trying to upload my project to google cloud ml-engine for training. I have followed the "getting started" guide, replacing in relevant places with my own files.
I manage to train locally using
gcloud ml-engine local train --module-name="my-model.task" --package-path=my-model/ -- ./my_model/model_params_google.json
Yes, I have dashes in the module name :(. I also made a symbolic link my_module -> my-module so that I can use the name with underscore instead of dash. In any case, I don't think this is the problem, since the above command works well locally.
My folder structure doesn't follow the recommended one, since I had the project before thinking about ml-engine. It looks like this:
my-model/
├── __init__.py
├── setup.py
├── task.py
├── model_params_google.json
├── src
│   ├── __init__.py
│   ├── data_handler.py
│   ├── elastic_helpers.py
│   ├── model.py
The problem is that the src folder is not packaged/uploaded with the code, so in the cloud, when I say from .src.model import model_fn in task.py, it fails.
The command I use for packaging is (in folder my-model/../):
gcloud ml-engine jobs submit training my_model_$(date +"%Y%m%d_%H%M%S") \
--staging-bucket gs://model-data \
--job-dir $OUTPUT_PATH \
--module-name="my_model.task" \
--package-path=my_model/ \
--region=$REGION \
--config config.yaml --runtime-version 1.8 \
-- \
tf_crnn/model_params_google.json --verbosity DEBUG
It packages my-model.0.0.0.tar.gz without the contents of my-model/src. I cannot figure out why. I'm using the example setup.py:
from setuptools import find_packages
from setuptools import setup
REQUIRED_PACKAGES = ['tensorflow>=1.8']
setup(
name='my_model',
version='0.1',
install_requires=REQUIRED_PACKAGES,
packages=find_packages(),
include_package_data=True,
description='my first model'
)
So, the question is, why does gcloud not pack the src folder ?

You need to put the setup.py in the directory above my-model.
You can check your results by invoking:
python setup.py sdist
Then un-taring the tarball in the dist directory. As is, you'll see that task.py is not included in the tarball.
By moving setup.py one directory higher and repeating, you'll see that task.py is included, as is everything in src.

Packaging stub files

Let's say I have very simple package with a following structure:
.
├── foo
│   ├── bar
│   │   └── __init__.py
│   └── __init__.py
└── setup.py
Content of the files:
setup.py:
from distutils.core import setup
setup(
name='foobar',
version='',
packages=['foo', 'foo.bar'],
url='',
license='Apache License 2.0',
author='foobar',
author_email='',
description=''
)
foo/bar/__init__.py:
def foobar(x):
return x
The remaining files are empty.
I install the package using pip:
cd foobar
pip install .
and can confirm it is installed correctly.
Now I want to create a separate package with stub files:
.
├── foo
│   ├── bar
│   │   └── __init__.pyi
│   └── __init__.pyi
└── setup.py
Content of the files:
setup.py:
from distutils.core import setup
import sys
import pathlib
setup(
name='foobar_annot',
version='',
packages=['foo', 'foo.bar'],
url='',
license='Apache License 2.0',
author='foobar',
author_email='',
description='',
data_files=[
(
'shared/typehints/python{}.{}/foo/bar'.format(*sys.version_info[:2]),
["foo/bar/__init__.pyi"]
),
],
)
foo.bar.__init__.pyi:
def foobar(x: int) -> int: ...
I can install this package, see that it creates anaconda3/shared/typehints/python3.5/foo/bar/__init__.pyi in my Anaconda root, but it doesn't look like it is recognized by PyCharm (I get no warnings). When I place pyi file in the main package everything works OK.
I would be grateful for any hints how to make this work:
I've been trying to make some sense from PEP 484 - Storing and distributing stub files but to no avail. Even pathlib part seem to offend my version of distutils
PY-18597 and https://github.com/python/mypy/issues/1190#issuecomment-188526651 seem to be related but somehow I cannot connect the dots.
I tried putting stubs in the .PyCharmX.X/config/python-skeletons but it didn't help.'
Some things that work, but don't resolve the problem:
Putting stub files in the current project and marking as sources.
Adding stub package root to the interpreter path (at least in some simple cases).
So the questions: How to create a minimal, distributable package with Python stubs, which will be recognized by existing tools. Based on the experiments I suspect one of two problems:
I misunderstood the structure which should be created by the package in the shared/typehints/pythonX.Y - if this is true, how should I define data_files?
PyCharm doesn't consider these files at all (this seem to be contradicted by some comments in the linked issue).
It suppose to work just fine, but I made some configure mistake and looking for external problem which doesn't exist.
Are there any established procedures to troubleshoot problems like this?

Problem is that you didn't include the foo/__init__.pyi file in your stub distribution. Even though it's empty, it makes foo a stub files package, and enables search for foo.bar.
You can modify the data_files in your setup.py to include both
data_files=[
(
'shared/typehints/python{}.{}/foo/bar'.format(*sys.version_info[:2]),
["foo/bar/__init__.pyi"]
),
(
'shared/typehints/python{}.{}/foo'.format(*sys.version_info[:2]),
["foo/__init__.pyi"]
),
],

How do I distribute all contents of root directory to a directory with that name

I have a project named myproj structured like
/myproj
__init__.py
module1.py
module2.py
setup.py
my setup.py looks like this
from distutils.core import setup
setup(name='myproj',
version='0.1',
description='Does projecty stuff',
author='Me',
author_email='me#domain.com',
packages=[''])
But this places module1.py and module2.py in the install directory.
How do I specify setup such that the directory /myproj and all of it's contents are dropped into the install directory?

In your myproj root directory for this project, you want to move module1.py and module2.py into a directory named myproj under that, and if you wish to maintain Python < 3.3 compatibility, add a __init__.py into there.
├── myproj
│   ├── __init__.py
│   ├── module1.py
│   └── module2.py
└── setup.py
You may also consider using setuptools instead of just distutils. setuptools provide a lot more helper methods and additional attributes that make setting up this file a lot easier. This is the bare minimum setup.py I would construct for the above project:
from setuptools import setup, find_packages
setup(name='myproj',
version='0.1',
description="My project",
author='me',
author_email='me#example.com',
packages=find_packages(),
)
Running the installation you should see lines like this:
copying build/lib.linux-x86_64-2.7/myproj/__init__.py -> build/bdist.linux-x86_64/egg/myproj
copying build/lib.linux-x86_64-2.7/myproj/module1.py -> build/bdist.linux-x86_64/egg/myproj
copying build/lib.linux-x86_64-2.7/myproj/module2.py -> build/bdist.linux-x86_64/egg/myproj
This signifies that the setup script has picked up the required source files. Run the python interpreter (preferably outside this project directory) to ensure that those modules can be imported (not due to relative import).
On the other hand, if you wish to provide those modules at the root level, you definitely need to declare py_modules explicitly.
Finally, the Python Packaging User Guide is a good resource for more specific questions anyone may have about building distributable python packages.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I make pip install package data (a config file)? - python

You can add a file name MANIFEST.in next to setup.py with a list of the file you want to add, wildcard allowed (ex: include *.yaml or include clana/config.yaml) then the option include_package_data=True will activate the manifest file

Following from the documentation on including data files, if your package has data files such as .yaml files, you may include them like so: setup( ... package_data={ "": ["*.yaml"], }, ... ) This will allow any file in your package with the file extension .yaml to be included.

Related

Python package published with poetry is not found after install

How to tell pbr to include non-code files in package

How to package sub-folder for gcloud ml?

Packaging stub files

How do I distribute all contents of root directory to a directory with that name

Categories

Resources