How can I specify an arbitrary set of files (non necessarily .py files) so that they are distributed and installed just like normal Python modules?
Some background
I am using distutils to distribute and install my Python library. One of the modules in this library imports a 3rd-party Python extension called bpy.so (this is a Blender module). The bpy.so extension requires some other files, as well. I want to distribute and install bpy.so and the additional required files with my library.
One way to do this is by specifying all of the extra files as data_files to setup(). However, I don't know how to reliably specify the same installation directory as is used for my pure python modules (e.g. /usr/local/lib/python3.2/dist-packages).
I can distribute the extra files by creating a MANIFEST.in file (see this question), but I also want to install the files.
I would suggest to use setuptools on top of distutils.
This is a good reference document to get started with it. The advantage of using setuptools is that it has a few nice features like the possibility to include all files under a given directory (package) automatically or according to filters. This is the section of the above document dealing with this specific aspect.
HTH!
PS: distutils' current situation/limitations are the source of much complain from most of its users. It seems a new generation of the library is on its way though!
Related
The book "Learning Python" by Mark Lutz states that in Python one can use various types of external modules, which include .py files, .zip archives, C/C++ compiled libraries and others. My question is, how does one usually handle installation of each type of module?
For example, I know that to use a .py module, I simply need to locate it with import. What about something like .dll or .a? Or for example, I found an interesting library on GitHub, which has no installation manual. How do I know which files to import?
Also, are there any ways of installing modules besides pip?
The answer depends on what you want to do.
You can use Ninja for example to use C++ modules and cython for C and there are various packages for almost any type of compiled code.
You can install packages via pip using the pypi package repository or by using cloned repositories that have a setup.py file inside.
Any other python based repo can be imported either by a custom build script (that they will provide) or by directly importing the relevant Python files. This will require you the dive into the code and check what are the relevant files.
Also, are there any ways of installing modules besides pip?
Yes, according to Installing Python Modules (Legacy version) modules packaged using distutils should be downloaded, unpacked and command
python setup.py install
or similar should be run. Beware that
The entire distutils package has been deprecated and will be removed
in Python 3.12.
I'm trying to create and distribute (with pip) a Python package that has Python code, and C++ code compiled to a .pyd file with Pybind11 (using Visual Studio 2019). I also want to include .pyi stub files, for VScode and other editors. I can't find much documentation on doing this correctly.
I'd like to be able to just install the package via pip as normal, and write from mymodule.mysubmodule import myfunc etc like a normal Python package, including autocompletes, type annotations, VScode intellisense etc using the .pyi files I'd write.
My C++ code is in multiple cpp and header files. It uses a few standard libraries, and a few external libraries (such as boost). It defines a single module, and 2 submodules. I want to be able to distribute this on Windows and Linux, and for x86 and x64. I am currently targeting Python 3.9, and the c++17 standard.
How should I structure and distribute this package? Do I include the c++ source files, and create a setup.py similar to the Pybind11 example? If so, how do I include the external libraries? And how do I structure the .pyi stub files? Does this mean whoever tries to install my package would need a c++ compiler as well?
Or, should I compile my c++ to a .pyd/.so file for each platform and architecture? If so, is there a way to specify which one gets installed through pip? And again, how would I structure the .pyi stubs?
Generating .pyi stubs
The pybind11 issue mentions a couple of tools (1, 2) to generate stubs for binary modules. There could be more, but I'm not aware of others. Unfortunately both are far from being perfect, so you probably still need to check and adjust the generated stubs manually.
Distribution of .pyi stubs
After correction of stubs you just include those .pyi files in you distribution (e.g. in wheel or as sources) along with py.typed indication file or, alternatively, distribute them separately as standalone package (e.g. mypackage-stubs).
Building wheels
Wheels allows users of your library to install it in binary form, i.e. without compilation. Wheels makes use of older compilers in order to be compatible with greater number of systems/platforms, so you might face some troubles with a C++17 library. (C++11 is old enough and should have no problems with wheels).
Building wheels for various platforms is tedious, the pybind11's python_example uses cibuildwheels package to do that, I would recommend this route if you are already using CI.
If wheels are missing for target platform the pip will attempt to install from source. This would require compiler and 3rd party libraries you are using to be already installed.
Maybe conda?
If your setup is complex and requires a number of 3rd party libraries it might be worth to write a conda recipe and use conda-forge to generate binary versions of the package. Conda is superior to pip, since it can manage non-python dependencies as well.
I have a simple script that has a dependency on dnspython for parsing zone files. I would like to distribute this script as a single .py that users can run just so long as they have 2.6/2.7 installed. I don't want to have the user install dependencies site-wide as there might be conflicts with existing packages/versions, nor do I want them to muck around with virtualenv. I was wondering if there was a way to embed a package like dnspython inside the script (gzip/base64) and have that script access that package at runtime. Perhaps unpack it into a dir in /tmp and add that to sys.path? I'm not concerned about startup overhead, I just want a single .py w/ all dependencies included that I can distribute.
Also, there would be no C dependencies to build, only pure python packages.
Edit: The script doesn't have to be a .py. Just so long as it is a single executable file.
You can package multiple Python files up into a .egg. Egg files are essentially just zip archives with well defined metadata - look at the setuptools documentation to see how to do this. Per the docs you can make egg files directly executable by specifying the entry point. This would give you a single executable file that can contain your code + any other dependencies.
EDIT: Nowadays I would recommend building a pex to do this. pex is basically an executable zip file with non stdlib dependencies. It doesn't contain a python distribution (like py2app/py2exe) but holds everything else and can be built with a single command line invocation. https://pex.readthedocs.org/en/latest/
The simplest way is just to put your python script named __main__.py with pure Python dependencies in a zip archive, example.
Otherwise PyInstaller could be used to produce a stand-alone executable.
please don't do this. If you do DO NOT make a habit of it.
pydns is BDS licensed but if you try to "embed" a gpl module in this way you could get in trouble
you can learn to use setuptools and you will be much happier in the long run
setuptools will handle the install of dependencies you identified (I'm not sure if the pydns you are using is pure python so you might create problems for your users if you try to add it yourself without knowing their environment)
you can set a url or pypi so that people could upgrade your script with easy_install -U
I am a bit confused. There seem to be two different kind of Python packages, source distributions (setup.py sdist) and egg distributions (setup.py bdist_egg).
Both seem to be just archives with the same data, the python source files. One difference is that pip, the most recommended package manager, is not able to install eggs.
What is the difference between the two and what is 'the' way to do distribute my packages?
(Note, I am not wanting to distribute my packages through PyPI, but I want to use a package manager that fetches my dependencies from PyPI)
setup.py sdist creates a source distribution: it contains setup.py, the source files of your module/script (.py files or .c/.cpp for binary modules), your data files, etc. The result is an archive that can then be used to recompile everything on any platform.
setup.py bdist (and bdist_*) creates a built distribution: it includes .pyc files, .so/.dll/.dylib for binary modules, .exe if using py2exe on Windows, your data files... but no setup.py. The result is an archive that is specific to a platform (for example linux-x86_64) and to a version of Python, and that can be installed simply by extracting it into the root of your filesystem (executables are in /usr/bin (or equivalent), data files in /usr/share, modules in /usr/lib/pythonX.X/site-packages/...). You can even build rpm archives that can be directly installed using your package manager.
2021 update: the tools to build and use eggs no longer exist in Python.
There are many more than two different kind of Python (distribution) packages. This command lists many subcommands:
$ python setup.py --help-commands
Notice the various different bdist types.
An egg was a new package type, introduced by setuptools but later adopted by the standard library. It is meant to be installed monolithic onto sys.path. This differs from an sdist package which is meant to have setup.py install run, copying each file into place and perhaps taking other actions as well (building extension modules, running additional arbitrary Python code included in the package).
eggs are largely obsolete at this point in time. EDIT: eggs are gone, they were used with the command "easy_install" that's been removed from Python.
The favored packaging format now is the "wheel" format, notably used by "pip install".
Whether you create an sdist or an egg (or wheel) is independent of whether you'll be able to declare what dependencies the package has (to be downloaded automatically at installation time by PyPI). All that's necessary for this dependency feature to work is for you to declare the dependencies using the extra APIs provided by distribute (the successor of setuptools) or distutils2 (the successor of distutils - otherwise known as packaging in the current development version of Python 3.x).
https://packaging.python.org/ is a good resource for further information about packaging. It covers some of the specifics of declaring dependencies (eg install_requires but not extras_require afaict).
I had my fair chance of getting through the python management of modules, and every time is a challenge: packaging is not what people do every day, and it becomes a burden to learn, and a burden to remember, even when you actually do it, since this happens normally once.
I would like to collect here the definitive overview of how import, package management and distribution works in python, so that this question becomes the definitive explanation for all the magic that happens under the hood. Although I understand the broad level of the question, these things are so intertwined that any focused answer will not solve the main problem: understand how all works, what is outdated, what is current, what are just alternatives for the same task, what are the quirks.
The list of keywords to refer to is the following, but this is just a sample out of the bunch. There's a lot more and you are welcome to add additional details.
PyPI
setuptools / Distribute
distutils
eggs
egg-link
pip
zipimport
site.py
site-packages
.pth files
virtualenv
handling of compiled modules in eggs (with and without installation via easy_install)
use of get_data()
pypm
bento
PEP 376
the cheese shop
eggsecutable
Linking to other answers is probably a good idea. As I said, this question is for the high-level overview.
For the most part, this is an attempt to look at the packaging/distribution side, not the mechanics of import. Unfortunately, packaging is the place where Python provides way more than one way to do it. I'm just trying to get the ball rolling, hopefully others will help fill what I miss or point out mistakes.
First of all there's some messy terminology here. A directory containing an __init__.py file is a package. However, most of what we're talking about here are specific versions of packages published on PyPI, one of it's mirrors, or in a vendor specific package management system like Debian's Apt, Redhat's Yum, Fink, Macports, Homebrew, or ActiveState's pypm.
These published packages are what folks are trying to call "Distributions" going forward in an attempt to use "Package" only as the Python language construct. You can see some of that usage in PEP-376 PEP-376.
Now, your list of keywords relate to several different aspects of the Python Ecosystem:
Finding and publishing python distributions:
PyPI (aka the cheese shop)
PyPI Mirrors
Various package management tools / systems: apt, yum, fink, macports, homebrew
pypm (ActiveState's alternative to PyPI)
The above are all services that provide a place to publish Python distributions in various formats. Some, like PyPI mirrors and apt / yum repositories can be run on your local machine or within your companies network but folks typically use the official ones. Most, if not all provide a tool (or multiple tools in the case of PyPI) to help find and download distributions.
Libraries used to create and install distributions:
setuptools / Distribute
distutils
Distutils is the standard infrastructure on which Python packages are compiled and built into distributions. There's a ton of functionality in distutils but most folks just know:
from distutils.core import setup
setup(name='Distutils',
version='1.0',
description='Python Distribution Utilities',
author='Greg Ward',
author_email='gward#python.net',
url='http://www.python.org/sigs/distutils-sig/',
packages=['distutils', 'distutils.command'],
)
And to some extent that's a most of what you need. With the prior 9 lines of code you have enough information to install a pure Python package and also the minimal metadata required to publish that package a distribution on PyPI.
Setuptools provides the hooks necessary to support the Egg format and all of it's features and foibles. Distribute is an alternative to Setuptools that adds some features while trying to be mostly backwards compatible. I believe Distribute is going to be included in Python 3 as the successor to Distutil's from distutils.core import setup.
Both Setuptools and Distribute provide a custom version of the distutils setup command
that does useful things like support the Egg format.
Python Distribution Formats:
source
eggs
Distributions are typically provided either as source archives (tarball or zipfile). The standard way to install a source distribution is by downloading and uncompressing the archive and then running the setup.py file inside.
For example, the following will download, build, and install the Pygments syntax highlighting library:
curl -O -G http://pypi.python.org/packages/source/P/Pygments/Pygments-1.4.tar.gz
tar -zxvf Pygments-1.4.tar.gz
cd Pygments-1.4
python setup.py build
sudo python setup.py install
Alternatively you can download the Egg file and install it. Typically this is accomplished by using easy_install or pip:
sudo easy_install pygments
or
sudo pip install pygments
Eggs were inspired by Java's Jarfiles and they have quite a few features you should read about here
Python Package Formats:
uncompressed directories
zipimport (zip compressed directories)
A normal python package is just a directory containing an __init__.py file and an arbitrary number of additional modules or sub-packages. Python also has support for finding and loading source code within *.zip files as long as they are included on the PYTHONPATH (sys.path).
Installing Python Packages:
easy_install: the original egg installation tool, depends on setuptools
pip: currently the most popular way to install python packages. Similar to easy_install but more flexible and has some nice features like requirements files to help document dependencies and reproduce deployments.
pypm, apt, yum, fink, etc
Environment Management / Automated Deployment:
bento
buildout
virtualenv (and virtualenvwrapper)
The above tools are used to help automate and manage dependencies for a Python project. Basically they give you tools to describe what distributions your application requires and automate the installation of those specific versions of your dependencies.
Locations of Packages / Distributions:
site-packages
PYTHONPATH
the current working directory (depends on your OS and environment settings)
By default, installing a python distribution is going to drop it into the site-packages directory. That directory is usually something like /usr/lib/pythonX.Y/site-packages.
A simple programmatic way to find your site-packages directory:
from distuils import sysconfig
print sysconfig.get_python_lib()
Ways to modify your PYTHONPATH:
Python's import statement will only find packages that are located in one of the directories included in your PYTHONPATH.
You can inspect and change your path from within Python by accessing:
import sys
print sys.path
sys.path.append("/home/myname/lib")
Besides that, you can set the PYTHONPATH environment variable like you would any other environment variable on your OS or you could use:
.pth files: *.pth files located in directories that are already on your PYTHONPATH are read and each line of the *.pth file is added to your PYTHONPATH. Basically any time you would copy a package into a directory on your PYTHONPATH you could instead create a mypackages.pth. Read more about *.pth files: site module
egg-link files: Internal structure of python eggs they are a cross platform alternative to symbolic links. Creating an egg link file is similar to creating a pth file.
site.py modifications
To add the above /home/myname/lib to site-packages with a *.pth file you'd create a *.pth file. The name of the file doesn't matter but you should still probably choose something sensible.
Let's create myname.pth:
# myname.pth
/home/myname/lib
That's it. Drop that into sysconfig.get_python_lib() on your system or any other directory in your PYTHONPATH and /home/myname/lib will be added to the path.
For packaging question, this should help http://guide.python-distribute.org/
For import, the old article from Fredrik Lundh http://effbot.org/zone/import-confusion.htm still a very good starting point.
I recommend Tarek Ziadek's Book on Python. There's a chapter dedicated to packaging and distribution.
I don't think import needs to be explored (Python's namespacing and importing functionality is intuitive IMHO).
I use pip exclusively now. I haven't run into any issues with it.
However, the topic of packaging and distribution is something worth exploring. Instead of giving a lengthy answer, I will say this:
I learned how to package and distribute my own "packages" by simply copying how Pylons or many other open-source packages do it. I then combined that sort-of template with reading up of the docs to flesh it out even further and have come up with a solid distribution method.
When you grok package management and distribution for python (distutils and pypi) it's actually quite powerful. I like it a lot.
[edit]
I also wanted to add in a bit about virtualenv. USE IT. I create a virtualenv for every project and I always use --no-site-packages; I install all the packages I need for that particular project (even if it's something common amongst them all, like lxml) inside the virtualev. It keeps everything isolated and it's much easier for me to maintain the grouping in my head (rather than trying to keep track of what's where and for which version of python!)
[/edit]