How to use find_packages() to package all files from subdirectory - python

I am creating a python package (for the first time) and am able to package contents but I am having issue with packaging one of the data files that I have in subdirectory. My directory structure looks like this
├── Jenkinsfile
├── MANIFEST.in
├── README.md
├── __init__.py
├── country
│ ├── __init__.py
│ └── folder1
│ ├── file1.py
│ ├── file2.py
│ ├── file3.py
│ ├── folder2
│ ├── file4.xlsx
│ ├── __init__.py
├── repoman.yaml
├── requirements.txt
├── setup.py
└── test
└── unit
├──── __init__.py
├──── test_unit.py
my setup.py has the following get_packages() method
def get_packages():
return find_packages(exclude=['doc', 'imgs', 'test'])
when I build the package, my package does not include file4.xlsx, can anyone tell me why is that the case and how can I fix it?

I found this answer which is similar to what I want to do
I had to update my setup.py to include package_data and MANIFEST.in to include *.xlsx file (by providing the full directory path to excel file)

Related

Is there a way to include extra files when building a PEX (i.e. with MANIFEST.in)?

I have a directory structure similar to the following:
├── myproj
│ ├── utils.py
│ ├── __init__.py
│ ├── routes
│ │ ├── __init__.py
│ │ ├── auth.py
│ │ └── stuff.py
├── html
│ ├── index.html
│ └── about.html
├── MANIFEST.in
├── setup.cfg
└── setup.py
The contents of MANIFEST.in are:
graft html
The following post alludes to being able to use MANIFEST.in with PEX (Python PEX: Pack a package with its sub-packages) but when I run either pex . -o myproject or python setup.py bdist_pex the html/ directory is not included, verified via unzip -Z1 myproject on the resulting output, but it is included when running python setup.py sdist.
How do I include these extra html files when building a PEX binary?
Defining a MANIFEST.in alone isn't enough. You also need to set the include_package_data option to True in setup.cfg.
This option will include extra files found in the package so you must also move the html directory inside the myproj package.
So the directory structure looks like:
├── myproj
│ ├── utils.py
│ ├── __init__.py
│ ├── routes
│ │ ├── __init__.py
│ │ ├── auth.py
│ │ └── stuff.py
│ ├── html
│ │ ├── index.html
│ │ └── about.html
├── MANIFEST.in
├── setup.cfg
└── setup.py
The contents of MANIFEST.in are:
graft myproj/html
And setup.cfg contains in the [options] section:
include_package_data = True

Python package-namespace: common test/docs/setup.py or one per namespace - which is the better pattern?

(In the interest of transparency, this is a follow up to a question asked here)
I'm dealing with related files in which a namespace package seems a good fit. I'm following the guide from the packaging authority, which places a setup.py in each namespace package;
mynamespace-subpackage-a/
setup.py
mynamespace/
subpackage_a/
__init__.py
mynamespace-subpackage-b/
setup.py
mynamespace/
subpackage_b/
__init__.py
module_b.py
In my tests, created a similar project. Apart from setup.py, I placed my unit tests, docs, and other stuff per namespace (I left out some of the directories for compactness.). I used pyscaffold to generate the namespaces.
├── namespace-package-test.package1
│ ├── LICENSE.txt
│ ├── README.md
│ ├── setup.cfg
│ ├── setup.py
│ ├── src
│ │ └── pkg1
│ │ ├── cli
│ │ │ ├── __init__.py
│ │ │ └── pkg1_cli.py
│ │ └── __init__.py
│ └── tests
├── namespace-package-test.package2
│ ├── AUTHORS.rst
However, I then noticed that pyscaffold has the option to create namespaces packages in the putup command.
(venv) steve#PRVL10SJACKSON:~/Temp$ putup --force my-package -p pkg1 --namespace namespace1
(venv) steve#PRVL10SJACKSON:~/Temp$ putup --force my-package -p pkg1 --namespace namespace2
This creates a folder structure like this;
├── AUTHORS.rst
├── CHANGELOG.rst
├── LICENSE.txt
├── README.rst
├── requirements.txt
├── setup.cfg
├── setup.py
├── src
│   ├── namespace1
│   │   ├── __init__.py
│   │   └── pkg1
│   │   ├── __init__.py
│   │   └── skeleton.py
│   └── namespace2
│   ├── __init__.py
│   └── pkg1
│   ├── __init__.py
│   └── skeleton.py
└── tests
├── conftest.py
└── test_skeleton.py
So I'm conflicted; I trust the team at pyscaffold, but it goes against the example from the packaging authority.
Are both approaches valid?
Is there a reason to choose one approach over the other?
The idea behind the namespace option in PyScaffold is to share/reuse namespaces across projects (in opposite of having more than one namespace inside a single project). Or in other words, to split a larger project in independently maintained/developed projects.
To my best understanding, having an structure like the one you showed in the 4th code block will not work. Using putup --force twice with 2 different namespaces for the same root folder is not the intended/supported usage.
The approach of PyScaffold is the same as the package authority, the only difference is that PyScaffold will assume you have only one package contained in a single project and git repository (PyScaffold also uses a src directory for the reasons explained in Ionel's blog post)
The reason behind adopting one setup.py per namespace+package is that it is required for building separated distribution files (i.e. you need one setup.py per *.whl).

What is the correct way to distribute "bin" and "tests" directories for a Python package?

I have created a python package.
At the advice of several internet sources (including https://github.com/pypa/sampleproject ), I have set up the directory structure like so:
root_dir
├── bin
│ └── do_stuff.py
├── MANIFEST.in
├── README.md
├── my_lib
│ ├── __init__.py
│ ├── __main__.py
│ └── my_lib.py
├── setup.cfg
├── setup.py
├── important_script.py
└── tests
├── __init__.py
└── test_lib.py
I have included tests, bin, and important_script.py in the manifest, and set include_package_data in setup.py to True.
However, after running pip install root_dir, I see that it correctly installed my_lib but bin and tests were just placed directly into Lib/site-packages as if they were separate packages.
I can't find important_script.py at all, and I don't think it was installed.
How do I correctly include these files/directories in my installation?
EDIT
So, it turns out that the bin and tests directories being placed directly into the site-packages directory was caused by something I was doing previously, but I can't discover what. At some point a build and a dist directory were generated in my root_dir (I assume by pip or setuptools?), and any changes I made to the project after that were not actually showing up in the installed package. After deleting these directories, I am no longer able to reproduce that issue.
The sample project distributes neither bin nor tests, it even explicitly excludes tests.
To include bin you should use scripts or entry_points (like in the sample project). Add this to your setup.py to setup() call:
scripts=['bin/do_stuff.py'],
To include tests you should restructure your tree to include the directory tests under the package directory:
root_dir
├── bin
│ └── do_stuff.py
├── MANIFEST.in
├── README.md
├── my_lib
│ ├── __init__.py
│ ├── __main__.py
│ └── my_lib.py
│ └── tests
│ ├── __init__.py
│ └── test_lib.py
├── setup.cfg
├── setup.py
├── important_script.py

Why does "pip install" not include my package_data files?

I can't figure out why when I run pip install ../path_to_my_proj/ (from a virtualenv) none of the data files are copied across to the sitepackage/myproj/ folder. The python packages are copied across correctly.
python version 3.4.4
My project directory is like this:
├── myproj
│ ├── __init__.py
│ ├── module1.py
│ └── module2.py
├── data_files
| ├── subfolder1
│ | ├── datafile.dll
│ | └── datafile2.dll
| └── subfolder2
│ ├── datafile3.dll
│ └── datafile4.dll
|
├── MANIFEST.in
└── setup.py
And my MANIFEST.in looks like
recursive-include data_files *
include README.md
my setup looks like:
setup(
name='myproj',
version='0.1.1',
install_requires=['requirement'],
packages=['myproj'],
include_package_data=True,
)
I encountered the same problem and asked about it on https://gitter.im/pypa/setuptools. The result? You just can't do that. data_files must live under myproj.
You can fake it by putting an empty __init__.py in data_files, but then it will get put into PYTHONHOME\Lib\site-packages along side myproj at same level, polluting the name space.

Versioning multiple projects with versioneer within a single git repository

I have a single, large git repo with many different projects (not submodules). A few of these projects are Python projects for which I'd like to track versioning with Python's versioneer, others may be completely independent projects (say, in Haskell). i.e. The directory structure looks like this:
myrepository
├── .git
├── README.md
├── project1/
│   ├── project1/
│   │ ├── __init__.py
│   │ └── _version.py
│   ├── versioneer.py
│   ├── setup.cfg
│   └── setup.py
├── project2/
├── project3/
└── project4/
   ├── project4/
   │ ├── __init__.py
   │ └── _version.py
   ├── versioneer.py
   ├── setup.cfg
   └── setup.py
This doesn't play well with versioneer because it can't discover the .git directory at the project root level, so I get a version of 0+unknown.
Questions:
Is there a suggested way to use versioneer with a single monolithic repo with multiple projects?
Depending on the answer to the above, is it recommended that my git tags read like: project1-2.1.0, project2-1.3.1, or should I unify the git tags like: 1.2.1, 1.2.2?

Categories