What is `build` in the context of python? - python

I'm currently learning about Python distribution packages, and reading through this article, and it says:
pyproject.toml tells build tools (like pip and build) what is required to build your project. This tutorial uses setuptools, so open pyproject.toml and enter the following content:
It mentions a few times more the concept of building Python packages.
As far as I know, at least when talking about pure Python code, Python distributions (sdists and wheels) only contain .py source files. So what is meant by the author when speaking about building?

build may have two meaning.
In your description building means to put all project files into one file with extension .whl and/or .gz - so later users (using pip) will download all as a single .whl (or .gz).
So we could say that this building can mean packing.
Second build (but not in your description) can be when people install package. Some packages may have C/C++ code (ie. numpy) and after downloading it needs to compile it (for your CPU).
So we could say that this building can means compiling.

Related

How to structure and distribute Pybind11 extension with stubs?

I'm trying to create and distribute (with pip) a Python package that has Python code, and C++ code compiled to a .pyd file with Pybind11 (using Visual Studio 2019). I also want to include .pyi stub files, for VScode and other editors. I can't find much documentation on doing this correctly.
I'd like to be able to just install the package via pip as normal, and write from mymodule.mysubmodule import myfunc etc like a normal Python package, including autocompletes, type annotations, VScode intellisense etc using the .pyi files I'd write.
My C++ code is in multiple cpp and header files. It uses a few standard libraries, and a few external libraries (such as boost). It defines a single module, and 2 submodules. I want to be able to distribute this on Windows and Linux, and for x86 and x64. I am currently targeting Python 3.9, and the c++17 standard.
How should I structure and distribute this package? Do I include the c++ source files, and create a setup.py similar to the Pybind11 example? If so, how do I include the external libraries? And how do I structure the .pyi stub files? Does this mean whoever tries to install my package would need a c++ compiler as well?
Or, should I compile my c++ to a .pyd/.so file for each platform and architecture? If so, is there a way to specify which one gets installed through pip? And again, how would I structure the .pyi stubs?
Generating .pyi stubs
The pybind11 issue mentions a couple of tools (1, 2) to generate stubs for binary modules. There could be more, but I'm not aware of others. Unfortunately both are far from being perfect, so you probably still need to check and adjust the generated stubs manually.
Distribution of .pyi stubs
After correction of stubs you just include those .pyi files in you distribution (e.g. in wheel or as sources) along with py.typed indication file or, alternatively, distribute them separately as standalone package (e.g. mypackage-stubs).
Building wheels
Wheels allows users of your library to install it in binary form, i.e. without compilation. Wheels makes use of older compilers in order to be compatible with greater number of systems/platforms, so you might face some troubles with a C++17 library. (C++11 is old enough and should have no problems with wheels).
Building wheels for various platforms is tedious, the pybind11's python_example uses cibuildwheels package to do that, I would recommend this route if you are already using CI.
If wheels are missing for target platform the pip will attempt to install from source. This would require compiler and 3rd party libraries you are using to be already installed.
Maybe conda?
If your setup is complex and requires a number of 3rd party libraries it might be worth to write a conda recipe and use conda-forge to generate binary versions of the package. Conda is superior to pip, since it can manage non-python dependencies as well.

How to release different version's binary to pypi.org?

Got a problem here, if I'm using cython in my package, the compiled .pyd file differents from different python version, for example, .pyd file compiled under python3.7 will not be recognized by python3.8 . If I'd like to release my package to pypi , let's say for example the version number be 1.0.0, how can I upload the package, let different version's python running the same command pip install package==1.0.0 and get its own version's compiled file separately?
Thanks.
Typically there are two kinds of archive formats that you should publish. They are called "distribution packages" and we have:
"source distributions" ("sdist" for short);
and "built distributions" (or "binary distributions", "bdist" for short).
Sdists should not contain any platform specific components, in your case should not contain the compiled Cython code.
And bdists on the other hand are meant to contain compiled code (compiled Cython files for example) and thus are allowed to be platform specific.
Nowadays the most common and the only recommended kind of bdist format is "wheel".
So you should distribute (publish on PyPI) the sdist of your project, and if possible try to build as many platform specific wheels as you can.
See for example the distribution packages for pandas v2.2.1, there is exactly one sdist and many different wheels that cover a wide range of:
Python interpreter versions
operating system
CPU bitness
etc.
Notice how all the file names are different. PyPI does not allow uploading files with the exact same name. pip (and other packaging-related tools) are able to interpret those file names and make educated guesses about what their contents are, and thus pick the right distributions to download and install.

Why we need python packaging (e.g. egg)? [duplicate]

This question already has answers here:
What is a Python egg?
(4 answers)
Closed 6 years ago.
When I need a Python library, I use pip to fetch it from PyPi and if I create a project and want to share it, I just need to have in place the setup.py file and that would make it easily installable. Therefore, I was wondering what is the use case for egg or wheel packages.
The Python Packaging User Guide has to say the following on this topic:
Wheel and Egg are both packaging formats that aim to support the use case of needing an install artifact that doesn’t require building or compilation, which can be costly in testing and production workflows.
These formats can be used to distribute packages that contain binary extension modules. These would otherwise require compilation during installation.
If no compilation is involved a source distribution is in principle sufficient, but the user guide still recommends to create a wheel for performance reasons:
Minimally, you should create a Source Distribution:
python setup.py sdist
A “source distribution” is unbuilt (i.e, it’s not a Built Distribution), and requires a build step when installed by pip. Even if the distribution is pure python (i.e. contains no extensions), it still involves a build step to build out the installation metadata from setup.py.
[...]
You should also create a wheel for your project. A wheel is a built package that can be installed without needing to go through the “build” process. Installing wheels is substantially faster for the end user than installing from a source distribution.
In short, packages are a convenience thing - mostly for the user.
Wheel packages unify the process of distributing and installing projects that contain pure python, platform dependent code, or compiled extensions. The user does not need to worry if the package is written in Python or in C - it just works.
Egg packages are an older standard, you should ignore them nowadays. Use pip install . instead of ./setup.py install to prevent creating them. (addendum: They are also .zips in disguise, from which Python reads package data — not exactly the most performant solution)
Wheel packages, on the other hand, are the new standard. They allow for creation of portable binary packages for Windows, macOS, and Linux (yes, Linux!). Nowadays, you can just do pip install PyQt5 (as an example) and it will just work, no C++ compiler and Qt libraries required on the system. Everything is pre-compiled and included in the wheel. Non-binary packages also benefit, because it’s safer not to run setup.py (all the metadata is in the wheel). (addendum: those are also .zips, but they are unpacked when installed)

an alternative to setup.py

setup.py has one significant problem:
it can not be parsed securely
This leads to a lot of problems - it can not be securely analysed, reading 100k+ packages from PyPI requires too much overhead, source packages can not be automatically converted to native system formats like Debian and Fedora etc.
So, are there any alternatives for packaging Python source that use static data format (not setup.py) for describing and wrapping their contents? So that a source package is just a .zip file of source checkout, which does not require magic with build steps.
Python wheels are the answer to the problems you describe: http://pythonwheels.com/
However, at the time of writing many projects do not supply wheels (but you can build them yourself.)

What is the difference between an 'sdist' .tar.gz distribution and an python egg?

I am a bit confused. There seem to be two different kind of Python packages, source distributions (setup.py sdist) and egg distributions (setup.py bdist_egg).
Both seem to be just archives with the same data, the python source files. One difference is that pip, the most recommended package manager, is not able to install eggs.
What is the difference between the two and what is 'the' way to do distribute my packages?
(Note, I am not wanting to distribute my packages through PyPI, but I want to use a package manager that fetches my dependencies from PyPI)
setup.py sdist creates a source distribution: it contains setup.py, the source files of your module/script (.py files or .c/.cpp for binary modules), your data files, etc. The result is an archive that can then be used to recompile everything on any platform.
setup.py bdist (and bdist_*) creates a built distribution: it includes .pyc files, .so/.dll/.dylib for binary modules, .exe if using py2exe on Windows, your data files... but no setup.py. The result is an archive that is specific to a platform (for example linux-x86_64) and to a version of Python, and that can be installed simply by extracting it into the root of your filesystem (executables are in /usr/bin (or equivalent), data files in /usr/share, modules in /usr/lib/pythonX.X/site-packages/...). You can even build rpm archives that can be directly installed using your package manager.
2021 update: the tools to build and use eggs no longer exist in Python.
There are many more than two different kind of Python (distribution) packages. This command lists many subcommands:
$ python setup.py --help-commands
Notice the various different bdist types.
An egg was a new package type, introduced by setuptools but later adopted by the standard library. It is meant to be installed monolithic onto sys.path. This differs from an sdist package which is meant to have setup.py install run, copying each file into place and perhaps taking other actions as well (building extension modules, running additional arbitrary Python code included in the package).
eggs are largely obsolete at this point in time. EDIT: eggs are gone, they were used with the command "easy_install" that's been removed from Python.
The favored packaging format now is the "wheel" format, notably used by "pip install".
Whether you create an sdist or an egg (or wheel) is independent of whether you'll be able to declare what dependencies the package has (to be downloaded automatically at installation time by PyPI). All that's necessary for this dependency feature to work is for you to declare the dependencies using the extra APIs provided by distribute (the successor of setuptools) or distutils2 (the successor of distutils - otherwise known as packaging in the current development version of Python 3.x).
https://packaging.python.org/ is a good resource for further information about packaging. It covers some of the specifics of declaring dependencies (eg install_requires but not extras_require afaict).

Categories