How to include and access data files in Python distribution? - python

I want to create a distributable Python package. For that, I organized my directories as follows:
.
├── config
│   └── test.yml
├── MANIFEST.in
├── sample
│   ├── hello.py
│   ├── __init__.py
│   └── world
│   ├── __init__.py
│   └── refer.py
└── setup.py
The MANIFEST.in contains only one line:
graft config
The setup.py is as follows:
from setuptools import setup
setup(
name='sample',
version='1.0',
packages=[
'sample',
'sample.world'
],
include_package_data=True
)
However, after I have run pip install ., I end up with the following content of the target directory:
.
./__pycache__
./__pycache__/__init__.cpython-36.pyc
./__pycache__/hello.cpython-36.pyc
./world
./world/__pycache__
./world/__pycache__/__init__.cpython-36.pyc
./world/__pycache__/refer.cpython-36.pyc
./world/refer.py
./world/__init__.py
./hello.py
./__init__.py
While I expect the config be there along with the YAML file it contains. What am I doing wrong? Thank you!

Here is some example setup.py to contains the data file and ref to find that.
from setuptools import setup
setup(...
packages=find_packages(),
package_data={'': ['config/*.yml'],
},
...)
setup tools doc

Related

Not able to import my own python package after pip installation

I'm trying to build my first python package public available but I'm having some trouble with installing it on another machine, not sure what is wrong. My project is here.
After all the CI steps on the master branch, Travis publishes the latest version to the pypi. After that, we can install the package in any place:
pip install spin-clustering
But when I try to import it on my regular python it says that the module does not exist.
$ python -c "import spin"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'spin'
My package was originally called "spin" but the name was already taken on pypi, I changed it to "spin-clustering", but as scikit-learn is imported with "sklearn" I thought that would be possible to import my package as "spin". Not sure what I'm missing here.
This is my package structure:
├── LICENSE
├── Makefile
├── Pipfile
├── README.md
├── examples
│   ├── circle-example.ipynb
│   └── random-cluster-example.ipynb
├── setup.cfg
├── setup.py
└── spin
├── __init__.py
├── distances
│   ├── __init__.py
│   ├── distances.py
│   └── tests
│   └── __init__.py
├── neighborhood_spin.py
├── side_to_side_spin.py
├── tests
│   ├── __init__.py
│   ├── test_spin.py
│   └── test_utils.py
└── utils.py
And my setup.py
from setuptools import setup, find_packages
setup(name="spin-clustering",
maintainer="otaviocv",
maintainer_email="otaviocv.deluqui#gmail.com",
description="SPIN clustering method package.",
license="MIT",
version="0.0.3",
python_requires=">=3.6",
install_requires=[
'numpy>=1.16.4',
'matplotlib>=3.1.0'
]
)
In your setup.py, you also need to specify what packages will be installed. The simplest way is using the provided find_packages function, which will scan your folders and try to figure out what the packages are (in some slightly unusual cases, your project organization will make this not work right). Your code imports find_packages, but is not using it.
Since you have none listed, nothing is actually installed (except the requirements, if missing).

Setuptools how to package just some modules and files

I'm packaging a little python package. I'm a complete newbie to python packaging, my directory structure is as follows (up to second level nesting):
.
├── data
│   ├── images
│   ├── patches
│   └── train.csv
├── docker
│   ├── check_gpu.py
│   ├── Dockerfile.gcloud_base
│   ├── Dockerfile.gcloud_myproject
├── env.sh
├── gcloud_config_p100.yml
├── legacy
│   ├── __init__.py
│   ├── notebooks
│   └── mypackage
├── notebooks
│   ├── EDA.ipynb
│   ├── Inspect_patches.ipynb
├── README.md
├── requirements.txt
├── scripts
│   ├── create_patches_folds.py
│   └── create_patches.py
├── setup.py
├── mypackage
   ├── data
   ├── img
   ├── __init__.py
   ├── jupyter
   ├── keras_utils
   ├── models
   ├── train.py
   └── util.py
My setup.py:
import os
from setuptools import setup, find_packages
REQUIRED_PACKAGES = [
"h5py==2.9.0",
"numpy==1.16.4",
"opencv-python==4.1.0.25",
"pandas==0.24.2",
"keras==2.2.4",
"albumentations==0.3.1"
]
setup(
name='mypackage',
version='0.1',
install_requires=REQUIRED_PACKAGES,
packages=find_packages(include=["mypackage.*"]),
include_package_data=False
)
The code i want to package corresponds only to the mypackage directory. That's why i passed "mypackage.*" to find_packages and used include_package_data=False.
If i run:
python setup.py sdist
All project structure gets packaged in the resulting tar.gz file.
Anyone knowing how to just package modules inside mypackage/ and the top level README file? I'm not finding this in setuptools docs.
First thing to fix is
packages=find_packages(include=["mypackage"]),
But you also need to understand that sdist is mostly controlled by the files MANIFEST or its template MANIFEST.in, not setup.py. You can compare what is created with sdist and bdist_wheel or bdist_egg; content of bdist_* is controlled by setup.py.
So my advice is to create the following MANIFEST.in:
prune *
include README.txt
recursive-include mypackage *.py

Python setup.py: How to get find_packages() to identify packages in subdirectories

I'm trying to create a setup.py file where find_packages() recursively finds packages. In this example, foo, bar, and baz are all modules that I want to be installed and available on the python path. For example, I want to be able to do import foo, bar, baz. The bar-pack and foo-pack are just regular non-python directories that will contain various support files/dirs (such as tests, READMEs, etc. specific to the respective module).
├── bar-pack
│   └── bar
│   └── __init__.py
├── baz
│   └── __init__.py
├── foo-pack
│   └── foo
│   └── __init__.py
├── setup.py
Then say that setup.py is as follows:
from setuptools import setup, find_packages
setup(
name="mypackage",
version="0.1",
packages=find_packages(),
)
However, when I run python setup.py install or python setup.py sdist, only the baz directory is identified and packaged.
I can simplify it down further, and run the following command, but again, only baz is identified.
python -c "from setuptools import setup, find_packages; print(find_packages())"
['baz']
Do you know how I might extend the search path (or manually hard-code the search path) of the find_packages()?
Any help is appreciated.
This is like using the src-layout for the "foo" and "bar" packages, but the flat layout for "baz". It's possible, but requires some custom configuration in the setup.py.
Setuptools' find_packages supports a "where" keyword (docs), you can use that.
setup(
...
packages=(
find_packages() +
find_packages(where="./bar-pack") +
find_packages(where="./foo-pack")
),
...
)
Since find_packages returns a plain old list, you could also just list your packages manually, and that's arguably easier / less magical.
setup(
...
packages=["baz", "bar", "foo"],
...
)
The non-standard directory structure means you'll also want to specify the package_dir structure for distutils, which describes where to put the installed package(s).
Piecing it all together:
setup(
name="mypackage",
version="0.1",
packages=["baz", "bar", "foo"],
package_dir={
"": ".",
"bar": "./bar-pack/bar",
"foo": "./foo-pack/foo",
},
)
The above installer will create this directory structure in site-packages:
.venv/lib/python3.9/site-packages
├── bar
│   ├── __init__.py
│   └── __pycache__
│   └── __init__.cpython-39.pyc
├── baz
│   ├── __init__.py
│   └── __pycache__
│   └── __init__.cpython-39.pyc
├── foo
│   ├── __init__.py
│   └── __pycache__
│   └── __init__.cpython-39.pyc
└── mypackage-0.1.dist-info
├── INSTALLER
├── METADATA
├── RECORD
├── REQUESTED
├── WHEEL
├── direct_url.json
└── top_level.txt

Packaging python with cython extension

I'm trying to build a package that uses both python and cython modules. The problem I'm having deals with imports after building and installing where I'm not sure how to make files import from the .so file generated by the build process.
Before building my folder structure looks like this
root/
├── c_integrate.c
├── c_integrate.pyx
├── cython_builder.py
├── __init__.py
├── integrator_class.py
├── integrator_modules
│   ├── cython_integrator.py
│   ├── __init__.py
│   ├── integrator.py
│   ├── numba_integrator.py
│   ├── numpy_integrator.py
│   ├── quadratic_error.png
│   ├── report3.txt
│   ├── report4.txt
│   └── report5.txt
├── report6.txt
├── setup.py
└── test
├── __init__.py
└── test_integrator.py
Building with python3.5 setup.py build gives this new folder in root
root/build/
├── lib.linux-x86_64-3.5
│   ├── c_integrate.cpython-35m-x86_64-linux-gnu.so
│   ├── integrator_modules
│   │   ├── cython_integrator.py
│   │   ├── __init__.py
│   │   ├── integrator.py
│   │   ├── numba_integrator.py
│   │   └── numpy_integrator.py
│   └── test
│   ├── __init__.py
│   └── test_integrator.py
The setup.py file looks like this
from setuptools import setup, Extension, find_packages
import numpy
setup(
name = "integrator_package",
author = "foo",
packages = find_packages(),
ext_modules = [Extension("c_integrate", ["c_integrate.c"])],
include_dirs=[numpy.get_include()],
)
My question is then: how do I write import statements of the functions from the .so file into ìntegrator_class.py in root and cython_integrator and test_integrator located in the build directory. Appending to sys.path seems like a quick and dirty solution that I don't much like.
EDIT:
As pointed out in the comments I haven't installed the package. This is because I don't know what to write to import from the .so file
In no specific order:
The file setup.py is typically located below the root of a project. Example:
library_name/
__init__.py
file1.py
setup.py
README
Then, the build directory appears alongside the project's source and not in the project source.
To import the file c_integrate.cpython-35m-x86_64-linux-gnu.so in Python, just import "c_integrate". The rest of the naming is taken care of automatically as it is just the platform information. See PEP 3149
A valid module is one of
a directory with a modulename/__init__.py file
a file named modulename.py
a file named modulename.PLATFORMINFO.so
of course located in the Python path. So there is no need for a __init__.py file for a compiled Cython module.
For your situation, move the Cython code in the project directory and either do a relative import import .c_integrate or a full from integrator_modules import c_integrate where the latter only works when your package is installed.
A few of this information can be found in my blog post on Cython modules http://pdebuyl.be/blog/2017/cython-module.html
I believe that this should let you build a proper package, comment below if not.
EDIT: to complete the configuration (see comments below), the poster also
Fixed the module path in the setup.py file so that it is the full module name starting from the PYTHONPATH: Extension("integrator_package.integrator_modules.c_integrat‌​or", ["integrator_package/integrator_modules/c_integrator.c"] instead of Extension("c_integrate", ["c_integrate.c"])]
Cythonize the module, build it and use with a same Python interpreter.
Further comment: the setup.py file can cythonize the file as well. Include the .pyx file instead of the .c file as the source.
cythonize(Extension('integrator_package.integrator_modules.c_integrat‌​or',
["integrator_package/integrator_modules/c_integrator.pyx"],
include_dirs=[numpy.get_include()]))

Python setup.py points to . as opposed to the directory specified in setup.py?

This is my current project setup:
.
├── README.md
├── build
│   ├── bdist.macosx-10.8-intel
│   └── lib
├── dist
│   └── giordano-0.1-py2.7.egg
├── giordano.egg-info
│   ├── PKG-INFO
│   ├── SOURCES.txt
│   ├── dependency_links.txt
│   ├── not-zip-safe
│   └── top_level.txt
├── requirements.txt
├── setup.py
├── src
│   ├── giordano
│   └── spider
├── test.txt
└── venv
├── bin
├── include
├── lib
└── share
And this is my setup file:
from setuptools import setup
setup(name='giordano',
version='0.1',
packages=['giordano'],
package_dir={'giordano': 'src/giordano'},
zip_safe=False)
When I do python setup.py install, I am able to import giordano in my code without problems.
However, when I am doing python setup.py develop, this is the console output:
[venv] fixSetup$ python setup.py develop
running develop
running egg_info
writing giordano.egg-info/PKG-INFO
writing top-level names to giordano.egg-info/top_level.txt
writing dependency_links to giordano.egg-info/dependency_links.txt
reading manifest file 'giordano.egg-info/SOURCES.txt'
writing manifest file 'giordano.egg-info/SOURCES.txt'
running build_ext
Creating /Users/blah/Dropbox/projects/Giordano/venv/lib/python2.7/site-packages/giordano.egg-link (link to .)
Removing giordano 0.1 from easy-install.pth file
Adding giordano 0.1 to easy-install.pth file
Installed /Users/blah/Dropbox/projects/Giordano
Processing dependencies for giordano==0.1
Finished processing dependencies for giordano==0.1
I noticed that the egg is linked to . as opposed to src/giordano. I can no longer import giordano in my code.
Any ideas why develop is not respecting package_dir?
Try with 'giordano': 'src'. distutils/distribute looks for the module or package name in the directory you specify; in the code you pasted, the value is one directory too deep.

Categories