I'm trying to create and distribute (with pip) a Python package that has Python code, and C++ code compiled to a .pyd file with Pybind11 (using Visual Studio 2019). I also want to include .pyi stub files, for VScode and other editors. I can't find much documentation on doing this correctly.
I'd like to be able to just install the package via pip as normal, and write from mymodule.mysubmodule import myfunc etc like a normal Python package, including autocompletes, type annotations, VScode intellisense etc using the .pyi files I'd write.
My C++ code is in multiple cpp and header files. It uses a few standard libraries, and a few external libraries (such as boost). It defines a single module, and 2 submodules. I want to be able to distribute this on Windows and Linux, and for x86 and x64. I am currently targeting Python 3.9, and the c++17 standard.
How should I structure and distribute this package? Do I include the c++ source files, and create a setup.py similar to the Pybind11 example? If so, how do I include the external libraries? And how do I structure the .pyi stub files? Does this mean whoever tries to install my package would need a c++ compiler as well?
Or, should I compile my c++ to a .pyd/.so file for each platform and architecture? If so, is there a way to specify which one gets installed through pip? And again, how would I structure the .pyi stubs?
Generating .pyi stubs
The pybind11 issue mentions a couple of tools (1, 2) to generate stubs for binary modules. There could be more, but I'm not aware of others. Unfortunately both are far from being perfect, so you probably still need to check and adjust the generated stubs manually.
Distribution of .pyi stubs
After correction of stubs you just include those .pyi files in you distribution (e.g. in wheel or as sources) along with py.typed indication file or, alternatively, distribute them separately as standalone package (e.g. mypackage-stubs).
Building wheels
Wheels allows users of your library to install it in binary form, i.e. without compilation. Wheels makes use of older compilers in order to be compatible with greater number of systems/platforms, so you might face some troubles with a C++17 library. (C++11 is old enough and should have no problems with wheels).
Building wheels for various platforms is tedious, the pybind11's python_example uses cibuildwheels package to do that, I would recommend this route if you are already using CI.
If wheels are missing for target platform the pip will attempt to install from source. This would require compiler and 3rd party libraries you are using to be already installed.
Maybe conda?
If your setup is complex and requires a number of 3rd party libraries it might be worth to write a conda recipe and use conda-forge to generate binary versions of the package. Conda is superior to pip, since it can manage non-python dependencies as well.
Related
The book "Learning Python" by Mark Lutz states that in Python one can use various types of external modules, which include .py files, .zip archives, C/C++ compiled libraries and others. My question is, how does one usually handle installation of each type of module?
For example, I know that to use a .py module, I simply need to locate it with import. What about something like .dll or .a? Or for example, I found an interesting library on GitHub, which has no installation manual. How do I know which files to import?
Also, are there any ways of installing modules besides pip?
The answer depends on what you want to do.
You can use Ninja for example to use C++ modules and cython for C and there are various packages for almost any type of compiled code.
You can install packages via pip using the pypi package repository or by using cloned repositories that have a setup.py file inside.
Any other python based repo can be imported either by a custom build script (that they will provide) or by directly importing the relevant Python files. This will require you the dive into the code and check what are the relevant files.
Also, are there any ways of installing modules besides pip?
Yes, according to Installing Python Modules (Legacy version) modules packaged using distutils should be downloaded, unpacked and command
python setup.py install
or similar should be run. Beware that
The entire distutils package has been deprecated and will be removed
in Python 3.12.
I am trying to create a binary python package that can be installed without compilation. The python package consists only one extension module written in C using the Python API. The extension module depending on the stable python ABI by using Py_LIMITED_API with 0x03060000 (3.6). Up to my knowledge, this means the extension module can work for all CPython versions that are not older than 3.6. I managed to create the sdist package, and I explored dumb, egg and wheel formats. I managed to create the binary packages, but none of them is perfect for my use case.
The problem is the extension module is depending on libssl.so, and this is the only "external" dependency of it. Because the python package itself is very stable, it doesn't require frequent releases. Therefore I wouldn't like to include libssl.so and take the burden of releasing new versions because of the security updates of OpenSSL (not mentioning to educate the users to update their python package regularly). I think because of this, the wheel format is not suitable, because the linux_*.whl packages cannot be uploaded to PyPI, but the manylinux2014_*.whl tag has to include libssl.so (and its dependencies) in the package. The dumb package are not suitable for PyPI based distribution, the egg format is not supported by pip, so they are also not suitable.
Because of the stable ABI and the widespread of libssl.so, I think it should be possible to release a single binary package for the most linux distributions and multiple python versions (similarly to the manylinux tags with wheel). Of course the pacakge would require libssl to be installed on the machine, but that is something I can accept for achieving better security. And that's where I am stuck.
My question is how can I create a binary python package, which
contains a python extension module,
depends on the system-wide installed libssl.so,
and can uploaded to and installed via pip and PyPI?
I tried to explore other possibilities, but I couldn't find anything else, so if you have any tips for other formats to look after, I would appreciate that also.
Got a problem here, if I'm using cython in my package, the compiled .pyd file differents from different python version, for example, .pyd file compiled under python3.7 will not be recognized by python3.8 . If I'd like to release my package to pypi , let's say for example the version number be 1.0.0, how can I upload the package, let different version's python running the same command pip install package==1.0.0 and get its own version's compiled file separately?
Thanks.
Typically there are two kinds of archive formats that you should publish. They are called "distribution packages" and we have:
"source distributions" ("sdist" for short);
and "built distributions" (or "binary distributions", "bdist" for short).
Sdists should not contain any platform specific components, in your case should not contain the compiled Cython code.
And bdists on the other hand are meant to contain compiled code (compiled Cython files for example) and thus are allowed to be platform specific.
Nowadays the most common and the only recommended kind of bdist format is "wheel".
So you should distribute (publish on PyPI) the sdist of your project, and if possible try to build as many platform specific wheels as you can.
See for example the distribution packages for pandas v2.2.1, there is exactly one sdist and many different wheels that cover a wide range of:
Python interpreter versions
operating system
CPU bitness
etc.
Notice how all the file names are different. PyPI does not allow uploading files with the exact same name. pip (and other packaging-related tools) are able to interpret those file names and make educated guesses about what their contents are, and thus pick the right distributions to download and install.
This question already has answers here:
What is a Python egg?
(4 answers)
Closed 6 years ago.
When I need a Python library, I use pip to fetch it from PyPi and if I create a project and want to share it, I just need to have in place the setup.py file and that would make it easily installable. Therefore, I was wondering what is the use case for egg or wheel packages.
The Python Packaging User Guide has to say the following on this topic:
Wheel and Egg are both packaging formats that aim to support the use case of needing an install artifact that doesn’t require building or compilation, which can be costly in testing and production workflows.
These formats can be used to distribute packages that contain binary extension modules. These would otherwise require compilation during installation.
If no compilation is involved a source distribution is in principle sufficient, but the user guide still recommends to create a wheel for performance reasons:
Minimally, you should create a Source Distribution:
python setup.py sdist
A “source distribution” is unbuilt (i.e, it’s not a Built Distribution), and requires a build step when installed by pip. Even if the distribution is pure python (i.e. contains no extensions), it still involves a build step to build out the installation metadata from setup.py.
[...]
You should also create a wheel for your project. A wheel is a built package that can be installed without needing to go through the “build” process. Installing wheels is substantially faster for the end user than installing from a source distribution.
In short, packages are a convenience thing - mostly for the user.
Wheel packages unify the process of distributing and installing projects that contain pure python, platform dependent code, or compiled extensions. The user does not need to worry if the package is written in Python or in C - it just works.
Egg packages are an older standard, you should ignore them nowadays. Use pip install . instead of ./setup.py install to prevent creating them. (addendum: They are also .zips in disguise, from which Python reads package data — not exactly the most performant solution)
Wheel packages, on the other hand, are the new standard. They allow for creation of portable binary packages for Windows, macOS, and Linux (yes, Linux!). Nowadays, you can just do pip install PyQt5 (as an example) and it will just work, no C++ compiler and Qt libraries required on the system. Everything is pre-compiled and included in the wheel. Non-binary packages also benefit, because it’s safer not to run setup.py (all the metadata is in the wheel). (addendum: those are also .zips, but they are unpacked when installed)
I am a bit confused. There seem to be two different kind of Python packages, source distributions (setup.py sdist) and egg distributions (setup.py bdist_egg).
Both seem to be just archives with the same data, the python source files. One difference is that pip, the most recommended package manager, is not able to install eggs.
What is the difference between the two and what is 'the' way to do distribute my packages?
(Note, I am not wanting to distribute my packages through PyPI, but I want to use a package manager that fetches my dependencies from PyPI)
setup.py sdist creates a source distribution: it contains setup.py, the source files of your module/script (.py files or .c/.cpp for binary modules), your data files, etc. The result is an archive that can then be used to recompile everything on any platform.
setup.py bdist (and bdist_*) creates a built distribution: it includes .pyc files, .so/.dll/.dylib for binary modules, .exe if using py2exe on Windows, your data files... but no setup.py. The result is an archive that is specific to a platform (for example linux-x86_64) and to a version of Python, and that can be installed simply by extracting it into the root of your filesystem (executables are in /usr/bin (or equivalent), data files in /usr/share, modules in /usr/lib/pythonX.X/site-packages/...). You can even build rpm archives that can be directly installed using your package manager.
2021 update: the tools to build and use eggs no longer exist in Python.
There are many more than two different kind of Python (distribution) packages. This command lists many subcommands:
$ python setup.py --help-commands
Notice the various different bdist types.
An egg was a new package type, introduced by setuptools but later adopted by the standard library. It is meant to be installed monolithic onto sys.path. This differs from an sdist package which is meant to have setup.py install run, copying each file into place and perhaps taking other actions as well (building extension modules, running additional arbitrary Python code included in the package).
eggs are largely obsolete at this point in time. EDIT: eggs are gone, they were used with the command "easy_install" that's been removed from Python.
The favored packaging format now is the "wheel" format, notably used by "pip install".
Whether you create an sdist or an egg (or wheel) is independent of whether you'll be able to declare what dependencies the package has (to be downloaded automatically at installation time by PyPI). All that's necessary for this dependency feature to work is for you to declare the dependencies using the extra APIs provided by distribute (the successor of setuptools) or distutils2 (the successor of distutils - otherwise known as packaging in the current development version of Python 3.x).
https://packaging.python.org/ is a good resource for further information about packaging. It covers some of the specifics of declaring dependencies (eg install_requires but not extras_require afaict).