How can I make setuptools ignore subversion inventory?

How can I make setuptools ignore subversion inventory? - python

When packaging a Python package with a setup.py that uses the setuptools:
from setuptools import setup
...
the source distribution created by:
python setup.py sdist
not only includes, as usual, the files specified in MANIFEST.in, but it also, gratuitously, includes all of the files that Subversion lists as being version controlled beneath the package directory. This is vastly annoying. Not only does it make it difficult to exercise any sort of explicit control over what files get distributed with my package, but it means that when I build my package following an "svn export" instead of an "svn checkout", the contents of my package might be quite different, since without the .svn metadata setuptools will make different choices about what to include.
My question: how can I turn off this terrible behavior, so that "setuptools" treats my project the same way whether I'm using Subversion, or version control it's never heard of, or a bare tree created with "svn export" that I've created at the end of my project to make sure it builds cleanly somewhere besides my working directory?
The best I have managed so far is an ugly monkey-patch:
from setuptools.command import sdist
del sdist.finders[:]
But this is Python, not the jungle, so of course I want a better solution that involves no monkeys at all. How can I tame setuptools, turn off its magic, and have it behave sensibly by looking at the visible, predictable rules in my MANIFEST.py instead?

I know you know much of this, Brandon, but I'll try to give as a complete answer as I can (although I'm no setuptools gury) for the benefit of others.
The problem here is that setuptools itself involves quite a lot of black magick, including using an entry point called setuptools.file_finders where you can add plugins to find files to include. I am, however, at a complete loss as to how REMOVE plugins from it...
Quick workaround: svn export your package to a temporary directory and run the setup.py from there. That means you have no svn, so the svn finder finds no files to include. :)
Longer workaround: Do you really need setuptools? Setuptools have a lot of features, so the answer is likely yes, but mainly those features are depdenencies (so your dependencies get installed by easy_install), namespace packages (foo.bar), and entry points. Namespace packages can actually be created without setuptools as well. But if you use none of these you might actually get away with just using distutils.
Ugly workaround: The monkeypatch you gave to sdist in your question, which simply makes the plugin not have any finders, and exit quickly.
So as you see, this answer, although as complete as I can make it, is still embarrassingly incomplete. I can't actually answer your question, though I think the answer is "You can't".

Create a MANIFEST.in file with:
recursive-exclude .
# other MANIFEST.in commands go here
# to explicitly include whatever files you want
See http://docs.python.org/distutils/commandref.html#sdist-cmd for the MANIFEST.in syntax.

Simple solution, do not use setuptools for creating the source distribution, downgrade to distutils for that command:
from distutils.command.sdist import sdist
from setuptools import setup
setup(
# ... all the usual setup arguments ...
cmdclass = {'sdist': sdist},
)

Probably the answer is in your setup.py. Do you use find_packages? This function by default uses the VCS (e.g. subversion, hg, ...). If you don't like it, just write a different Python function which collects only the things you want.

I would argue that the default sdist behavior is correct. When you are building a source distribution, I would expect it to contain everything that is checked into Subversion. Of course it would be nice to be able to override it cleanly in special circumstances.
Compare sdist to bdist_egg; I bet only the files that are specified explicitly get included.
I did a simple test with three files, all in svn. Empty dummy.lkj and foobar.py and with setup.py looking like this:
import setuptools
setuptools.setup(name='foobar', version='0.1', py_modules=['foobar'])
sdist creates a tarball that includes dummy.lkj. bdist_egg creates an egg that does not include dummy.lkj.

You probably want something like this:
from distutils.core import setup
def packages():
import os
packages = []
for path, dirs, files in os.walk("yourprogram"):
if ".svn" in dirs:
dirs.remove(".svn")
if "__init__.py" in files:
packages.append(path.replace(os.sep, "."))
return packages
setup(
# name, version, description, etc...
packages = packages(),
# pacakge_data, data_files, etc...
)

Related

python import yaml - but which one? [duplicate]

This question already has answers here:
Python: How do I find which pip package a library belongs to?
(2 answers)
Closed 24 days ago.
Ok, so you clone a repo, there's an import
import yaml
ok, so you do pip install yaml and you get:
ERROR: No matching distribution found for yaml
Ok, so you look for a package with yaml in it, and there's like a gazillion of them... usually adding py in front does the job, but...
How on earth should I know which one was used?!
And it's not just yaml, oh no... there's:
import cv2 # python-opencv
import PIL # Pillow
and the list goes on and on...
How can I know which import uses which package? Shouldn't there be a PEP for this? Or a naming convention, e.g. import is always the same as the package name?
There's a similar topic here, if you're not frustrated enough :)

[When I clone a repo,] How can I know which import uses which package?
In short: it is the cloned code's responsibility to explain this, and it is an expected courtesy that the cloned code includes an installer that will take care of it.
If this is just some random person's bundle of .py files on GitHub with no installation instructions, look for notes in the associated documentation; failing that, make an issue on the tracker. (Or just give up. Maybe look for a better-engineered project that does the same thing.)
However, most "serious", contemporary Python projects are meant to be installed by using some form of packaging system. These have evolved over the years, and best practices have changed many times; but generally speaking, a properly "packaged" and "distributed" project will have either a setup.py or (newer; better in many ways, but not universally adopted yet) pyproject.toml file at the top level.
A pyproject.toml file is a config file in TOML format that simply describes a bunch of project metadata. This requires a build backend conforming to PEP 517. For a while, this required third-party tools, such as Poetry; but the standard setuptools can handle this since version 40.8.0. (As of this writing, the current release is 65.7.0.)
A setup.py script is executable code that pip will invoke after downloading a package from PyPI (or another package index). Generally, this script will use either setuptools or distutils (the predecessor to setuptools; it has finally been officially deprecated in 3.10, and will be removed in 3.12) to install the project, by calling a function named setup and passing it a big dict with some project metadata.
Security warning: this file is still executable code. It is arbitrary code, and it doesn't have to be following the standard conventions. Also, the package that is actually downloaded from PyPI doesn't necessarily match the project's source shown on GitHub (or another Git provisioning website), if such is even available. (This problem also affects package managers in other languages and ecosystems, notably npm for Javascript.)
With the setup.py based approach, package dependencies are specified using a keyword argument to the setup function. The specification has changed many times; currently, projects still using a setup.py should use the install_requires keyword argument.
With the pyproject.toml based approach, using setuptools' backend, dependencies will be an array (using JSON terminology, as TOML is a superset) stored under project.dependencies. This will vary for other backends; for example, Poetry expects this information under tool.poetry.dependencies.
In any event, pip freeze will output a list of what's installed in the current environment. It's a somewhat common practice for developers to test the code in a virtual environment where the dependencies are installed, dump this output to a requirements.txt file, and include that as documentation.
[When I want to use a third-party library in my own code,] How can I know which import uses which package?
It's worth considering the question the other way around, too: given that we have installed OpenCV for Python using pip install opencv-python, and want to use it in our own code, how do we know to import cv2 specifically?
The answer: there is no convention, and certainly no requirement for the installed package name to match the PyPI name, nor the GitHub etc. repository name. Read the documentation. Everyone who intends for their code to be used as a library, will be more than willing to show how, on at least a basic level.

Watch for requirements.txt . Big projects usually have it. You can import packages from this file. Else just google.

Keep in mind that it might not be a pip package.
Probably what is happening is that the main script is trying to import a secondary script (yaml.py, in this case) with functions or utils for the main script to use.
Check if the repo contains a file named yaml.py. If it's the case make sure to run the main script while the yaml.py is in the same directory.
Also, check for a requirements.txt file.
You can install all the requirements inside the file running in shell this line:
pip install -r *path to your requirements.txt*
Hope that this helps.

Any package on PyPI or cloned from some online repository is free to set itself up with a base directory name it chooses. That base directory xyz determines the import xyz line. Additionally a package name on PyPI doesn't have to match the repository name where its source code revisions are kept (assuming there is any).
This has the disadvantage that there is no one-to-one relation between package name, repo and/or import-line. But the advantage is that you e.g. can install Pillow, which is backwards compatible with PIL and still use import PIL instead of changing all your sources to use import Pillow as PIL.
If the repo you clone has a requirements.txt look there, you can also look in the setup.py for extra_require. But there is no guarantee that these are available, or contain the names of the packages to install (e.g. I use a generic setup.py that reads its info from a datastructure in the __init__.py file when creating/installing a package).
yaml seems to be a reserved name on PyPI (at least when I tried to upload a package with that name a few years ago). So that might be the reason the package is named PyYAML, although the Py is not very informative as the python code will not function in another programming language. PyPI' search is not very helpful as it relevance ordering is not relevant (at least not for yaml).
PyPI has no entry in the metadata for the import line, but you could extract that from .whl package file as the import line is the top level directory that doesn't match .dist-info. This is normally not possible from a .tar.gz` package file. I don't know of any site that does this kind of automatic scraping.
You can click through the packages on PyPI, after searching the import term, and hope you find something that matches the import in the documentation, but that is no guarantee you get the right one.
You might be best of searching for import yaml here on stackoverflow, and hope that the question or the answer mentions the package name.

thank you very much for your help and ideas. Big thanks to Karl Knechter for his exhaustive answer.
tl;dr: I think using some sort of "package" / "distribution" as a standard, would make everyone's lives easier.
However, my question was half-theoretical, to point out something I'd call, an incoherence in Python. You are of course right, there should be setuptools or requirements.txt or at least some documentation. But, if there isn't any, we're prone to error or additional browsing.
GospelBG pointed out something important. There could be a script yaml.py in the main folder and we need to check and/or guess.
Most importantly, naming imports differently than packages is just plainly misleading. There should be a naming convention or a PEP for this. Again, you can of course eventually get the proper package etc., but it's not explicit and obvious, and it should be! Because in programming, we like it that way, don't we?
I'm no seasoned dev in Python and I'm learning C++, but e.g. in C++, you import a header file with a particular name and static or dynamic libraries by their filename. Now I know this is very "step-by-step, on foot method", but at least you use the exact filenames.
On the upper level you have CMake, which would be an equivalent of setuptools where using find_package or find_library you can import package / library. To be honest, I'm not sure if all packages have the exact equivalent name, but at least the ones I used, did match.
Thanks again for your help and answers! I'm open for discussion and comments :)

Tell the module requirements in a particular directory (package)

In my office we have a quite complex directory structure when it comes to our code.
One of the things we have is a libs module to drop "common" things used by other parts of our big application (or set of applications... that are all living under a common directory).
The code in that libs/ directory requires certain packages installed in order for it to work. In said libs/ directory we have a requirements.txt file that supposedly lists the dependencies required for the things (things being Python code) in it to work. We have been filling that requirements.txt file pretty manually, tracking that "if this .py file uses this module, we should add it to the requirements file" so it's almost certain that by now we have forgotten adding some required modules.
Because of the complex structure we have (some parts use pipenv, some other have their own requirements.txt...) is very hard knowing whether a required module is going to end up installed or not.
So I would like to make sure that this libs/ directory (cough, cough... module ) has all its dependencies listed in its libs/requirements.txt.
Is that possible? Ideally it'd be "run this command passing /libs/ as an argument, it'll scan the directory and tell you what packages are needed by the py(s) found in it"
Thank you in advance.

Unfortunately, python does not know whether its dependencies are satisfied until runtime. requirements.txt is just a helper file for pip and similar tools, and you have to update it manually.
That said, you could
use the os module to recursively get a list of all *.py files in the folder
parse each one of them for lines having the format import aaa.bbb or from aaa import bbb
keep a set of the imports
However, even in that case, the name of the imported module is not the same as the name you need to pass to pip (eg, import yaml requires pyyaml in requirements.txt), but at least it could be a hint of what's missing.

Python egg creation ignores non .py files

I would like to create an egg from two directories and want to include .config and .log files. The structure of the directories is the following:
MSKDataDownloader
|_______configs
|________sensors.config
MSKSubscriber
|_______doc
|________dependencies.log
Here's my setup.py file:
from setuptools import setup, find_packages
setup(
name='MSKDataDownloader',
version='1.0.0',
description='Data Downloader',
packages=find_packages(),
include_package_data=True,
package_data={
'MSKDataDownloader': ['config/*.config'],
'MSKSubscriber': ['doc/*.log']
'MSKSubscriber': ['config/*.config']
}
)
What am I doing wrong? Why is it not including the .config and .log files in the egg.

The problem is that include_package_data=True doesn't mean what you think it means (or what most reasonable people would think it means). The short version is, just get rid of it.
From the docs:
If set to True, this tells setuptools to automatically include any data files it finds inside your package directories that are specified by your MANIFEST.in file. For more information, see the section below on Including Data Files.
If you follow the link, you'll see that it in fact makes setuptools ignore whatever you told it explicitly in package_data, and instead look for every file mentioned in MANIFEST.in and find it within your directory tree (or source control tree):
If using the setuptools-specific include_package_data argument, files specified by package_data will not be automatically added to the manifest unless they are listed in the MANIFEST.in file.
And, since you don't have a MANIFEST.in, this means you end up with nothing.
So, you want to do one of two things:
Remove include_package_data=True.
Create a MANIFEST.in and remove package_data=….
This is all complicated by the fact that there are lots of examples and blog posts and tutorials left over from the distribute days1 that are just plain wrong for modern setuptools. In fact, there are a whole lot more out-of-date and wrong posts out there than correct ones.
The obvious answer is to just only use the tutorials and examples from the PyPA on pypa.org… but unfortunately, they haven't get written tutorials that cover anywhere near everything that you need.
So, often, you pretty much much have to read old tutorials, then look up everything they tell you in the reference docs to see which parts are wrong.
1. IIRC, in distribute, the include_package_data=True would cause your extra files to get added to an sdist, just not to anything else. Which still sounds useless, right? Except that you could make your egg and other distributions depend on building the sdist then running a script that generates the MANIFEST.in programmatically. Which was useful for… I forget, something to do with pulling version files from source control maybe?

How to install python binding of a C++ library

Imaging that we are given a finished C++ source code of a library, called MyAwesomeLib. The goal is to expose some of its power to python, so we create a wrapper using swig and generated a python package called PyMyAwesomeLib.
The directory structure now looks like
root_dir
|-src/
|-lib/
| |- libMyAwesomeLib.so
| |- _PyMyAwesomeLib.so
|-swig/
| |- PyMyAwesomeLib.py
|-python/
|- Script_using_myawesomelib.py
So far so good. Ideally, all we want to do next is to copy lib/*.so swig/*.py and python/*.py into the corresponding directory in site-packages in a pythonic way, i.e. using
python setup.py install
However, I got very confused when trying to achieve this simple goal using setuptools and distutils. Both tools handles the compilation of python extensions through an internal system, where the source file, compiler flags etc. are passed using setup(ext_module=[Extension(...)]). But this is ridiculous since MyAsesomeLib has a fully functioning build system that is based on makefile. Porting the logic embedded in makefiles would be redundant and completely un-necessary work.
After some research, it seems there are two options left, I can either override setuptools.command.build and setuptools.command.install to use the existing makefile and copy the results directly, or I can somehow let setuptools know about these files and ask it to copy them during installation. The second way is more appealing, but it is what gives me the most headache. I have tried the following optionts without success
package_data, and include_package_data does not work because *.so files are not under version control and they are not inside of any package.
data_files does not seems to work since the files only get included when running python setup.py sdist, but ignored when python setup.py install. This is the opposite of what I want. The .so files should not be included in the source distribution, but get copied during the installation step.
MANIFEST.in failed for the same reason as data_files.
eager_resources does not work either, but honestly I do not know the difference between eager_resources and data_files or MANIFEST.in.
I think this is actually a common situation, and I hope there is a simple solution to it. Any help would be greatly appreciated.

Porting the logic embedded in makefiles would be redundant and
completely un-necessary work.
Unfortunately, that's exactly what I had to do. I've been struggling with this same issue for a while now.
Porting it over actually wasn't too bad. distutils does understand SWIG extensions, but it this was implemented rather haphazardly on their part. Running SWIG creates Python files, and the current build order assumes that all Python files have been accounted for before running build_ext. That one wasn't too hard to fix, but it's annoying that they would claim to support SWIG without mentioning this. Distutils attempts to be cross-platform when compiling things, so there is still an advantage to using it.
If you don't want to port your entire build system over, use the system's package manager. Many complex libraries do this (but they also try their best with setup.py). For example, to get numpy and lxml on Ubuntu you'd just do:
sudo apt-get install python-numpy python-lxml. No pip.
I realize you'd rather write one setup file instead of dealing with every package manager ever so this is probably not very helpful.
If you do try to go the setuptools route there is one fatal flaw I ran into: dependencies.
For instance, if you are distributing a SWIG-based project, it's going to need libpython. If they don't have it, an error like this happens:
#include <Python.h>
error: File not found
That's pretty unhelpful to the average user.
Even worse, if you require a shared library but the user's library is out of date, the user can get some crazy errors. You're at the mercy of their C++ compiler to output Google-friendly error messages so they can figure it out.
The long-term solution would be to get setuptools/distutils to get better at detecting non-python libraries, hopefully as good as Ruby's gem. I pretty much had to roll my own. For instance, in this setup.py I'm working on you can see a few functions at the top I hacked together for dependency detection (still doesn't work on all systems...definitely not Windows).

How to distribute `.desktop` files and icons for a Python package? [duplicate]

Currently I'm using the auto-tools to build/install and package a project of mine, but I would really like to move to something that feels more "pythonic".
My project consists of two scripts, one module, two glade GUI descriptions, and two .desktop files. It's currently a pure python project, though that's likely to change soon-ish.
Looking at setuptools I can easily see how to deal with everything except the .desktop files; they have to end up in a specific directory so that Gnome can find them.
Is using distuils/setuptools a good idea to begin with?

I managed to get this to work, but it kinda feels to me more like a workaround.
Don't know what's the preferred way to handle this...
I used the following setup.py file (full version is here):
from setuptools import setup
setup(
# ...
data_files=[
('share/icons/hicolor/scalable/apps', ['data/mypackage.svg']),
('share/applications', ['data/mypackage.desktop'])
],
entry_points={
'console_scripts': ['startit=mypackage.cli:run']
}
)
The starter script trough entry_points works. But the data_files where put in an egg file and not in the folders specified, so they can't be accessed by the desktop shell.
To work around this, I used the following setup.cfg file:
[install]
single-version-externally-managed=1
record=install.txt
This works. Both data files are created in the right place and the .desktop file is recognized by Gnome.

In general, yes - everything is better than autotools when building Python projects.
I have good experiences with setuptools so far. However, installing files into fixed locations is not a strength of setuptools - after all, it's not something to build installaters for Python apps, but distribute Python libraries.
For the installation of files which are not application data files (like images, UI files etc) but provide integration into the operating system, you are better off with using a real packaging format (like RPM or deb).
That said, nothing stops you from having the build process based on setuptools and a small make file for installing everything into its rightful place.

You can try to use python-distutils-extra. The DistUtilsExtra.auto module automatically supports .desktop files, as well as Glade/GtkBuilder .ui files, Python modules and scripts, misc data files, etc.
It should work both with Distutils and Setuptools.

I've created https://pypi.python.org/pypi/install-freedesktop. It creates .desktop files automatically for the gui_scripts entry points, which can be customized through a setup argument, and supports --user as well as system-wide installation. Compared to DistUtilsExtra, it's more narrow in scope and IMHO more pythonic (explicit is better than implicit).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.