Is there a way to bundle multiple packages together with Python setuptools?

Is there a way to bundle multiple packages together with Python setuptools? - python

I have packages A and B, both have their own git repository, PyPI page, etc... Package A depends on package B, and by using the install_requires keyword I can get A to automatically download and install B.
But suppose I want to go a step further for my especially non-savvy users; I want to actually include package B within the tar/zip for package A, so no download is necessary (this also lets them potentially make any by-hand edits to package B's setup.cfg)
Is there a suggested (ideally automated) way to,
Include B in A when I call sdist for A
Tell setuptools that B is bundled with A for resolving the dependency (something like a local dependency_links)
Thanks!

This is called 'vendorizing' and no, there is no support for such an approach.
It is also a bad idea; you want to leave installation to the specialized tools, which not only manage dependencies but also what versions are installed. With tools like buildout or pip's requirements.txt format you can control very precisely what versions are being used.
By bundling a version of a dependency inside, you either force upon your users what version they get to use, or make it harder for such tools to ensure the used versions for a given installation are consistent. In addition, you are potentially wasting bandwidth and space; if other packages have also included the same requirements, you now have multiple copies. And if your dependency is updated to fix a critical security issue, you have to re-release any package that bundles it.
In the past, some packages did use vendorize packaging to include a dependency into their distribution. requests was a prime example; they stepped away from this approach because it complicated their release process. Every time there was a bug in one of the vendored packages they had to produce a new release to include the fix, for example.
If you do want to persist in including packages, you'll have to write your own support. I believe requests manually just added the vendorized packages to their repository; so they kept a static copy they updated from time to time. Alternatively, you could extend setup.py to download code when you are creating a distribution.

Related

Is there an equivalent to NPM's "peerDependencies" in Python (preferably in setuptools)?

I am looking for a way to express something that resembles NPM's peerDependencies in setuptools.
My Python library is a plugin that should work with another Python library that I don't want to have as a dependency. Instead I'd like my end user to be responsible for it and install it by themselves. I can't find a proper way to express this in Python's setuptools (or any other build for that matter), to let my user "know" about the library
From my current understanding, this is a close approximation between the tools:
NPM
setuptools
dependencies
install_requires
optionalDependencies
extras_require
peerDependencies
???
I have two possible solutions, both of which I find lacking:
Use extras_require anyway
Specify my requirements under an extras_require will do the job, but it feels like abusing it, because the user shouldn't install these extras.
Just document it
Inform the user that they need to install that package separately. Feels lame too

To my understanding, there is no need for that, as Python handles dependencies different anyway.
With npm/node you can have multiple versions of one package withing the same environment, if it required by one of the sub-dependencies (How does NPM handle version conflicts?)
This is not possible in Python. Only a single version of a library can be installed in the one environment. This means that the general issue that you are trying to solve with peer-dependencies, can not happen.
My recommendation is to specify your dependencies as normal, but with a loose version specifier.

How to install modules from a package?

For installing python packages, is there a way to only install the specific modules I need (plus dependencies) into the virtual environment?
For example, I import the following in my code
from statsmodels.tsa.arima_model import ARIMA
I only need ARIMA (and dependencies), and won't need the complete statsmodels library.
How do I accomplish that?
Rationale:
I am trying to zip up the packages and deploy to AWS lambda. However, I am exceeding the 50MB file size limit.
If there is a way to install only the modules I need from the packages, I can reduce the file size.

Here's a horrible brute-force approach. I hope someone gives you a better answer!
Assuming the source you want to install is public, its licensing allows this kind of use, and you are willing to deal with the hassle, I imagine you could:
copy the source
remove anything unused by your required module
place that module's source inside your package, and wire it up from there. (This answer proposes a reasonable file structure for vendored software)
Alternately, if the assumptions above hold but you prefer not to vendor directly within your software, you could probably:
copy the source (e.g. fork the repo)
remove anything not used by your required module, making sure to leave a working python package
use pip or your environment's native dependency management tools to download and install your package in the virtual env.
Please note that both of these approaches will expose you to a number of risks, including but not limited to:
breaking the module: what happens to your project if you accidentally created a logical or security flaw?
loss of currency: how do you plan to keep your new mini-package up to date?
liability: Does the vendored software's license allow this kind of use? Are you adequately protected if there is something wrong with the software you have modified?

Keep track of pypi dependencies - who is using my packages

Is there anyway through either pip or PyPi to identify which projects (published on Pypi) might be using my packages (also published on PyPi) - I would like to identify the user base for each package and possible attempt to actively engage with them.
Thanks in advance for any answers - even if what I am trying to do isn't possible.

This is not really possible. There is no easily accessible public dataset that lets you produce a dependency graph.
At most you could scan all publicly available packages to parse their dependencies, but even then those dependencies are generated by running the setup.py script, so the dependencies can be set dynamically. It is quite common to adjust the dependencies based on the Python version (installing a backport of a standard-library dependency on older Python versions, for example). People have done this before but this is not an easy, lightweight task.
Note that even then, you'll only find publicly declared dependencies. Any dependencies declared by private packages, not published to PyPI or another public repository, can't be accounted for.

Python package import subpackage - good practice?

My package has a dependency on the latest version of the jsonpickle package. Older versions are installable via pip, however I need the latest version (i.e on Github) for it to work. Is it generally considered OK in this situation to bundle the latest version of jsonpickle in my code? Is there some other solution? I'd rather not ask my users not to clone from github.
I'm thinking of organising my package like this:
My package
|
__init__.py
file1.py
file2.py
\
jsonpickle (latest)
i.e Doing what was done here: Python: importing a sub‑package or sub‑module

As kag says, this is generally not a good idea. It's not that it's "frowned upon" as being unfriendly to the other packages, it's that it causes maintenance burdens for you and your users. (Imagine that there's a bug that's fixed in jsonpickle that affects your users, but you haven't picked up the fix yet. If you'd done things normally, all they'd have to do is upgrade jsonpickle, but if you're using an internal copy, they have to download the jsonpickle source and yours, hack up your package, and install it all manually.)
Sometimes, it's still worth doing. For example, the very popular requests module includes its own copy of other packages like urllib3. And yes, it does face both of the costs described above. But it also means that each version of request can rely on an exact specific version of urllib3. Since requests makes heavy use of urllib3's rarely-used interfaces, and even has workarounds for some of its known bugs, that can be valuable.
In your case, that doesn't sound like the issue. You just need a bleeding-edge version of jsonpickle temporarily, until the upstream maintainers upload a new version to PyPI. The problem isn't that you don't want your users all having different versions; it's that you don't want to force them to clone the repo and figure out how to install it manually. Fortunately, pip takes care of that for you, by wrapping most of the difficulties up in one line:
pip install git+https://github.com/foo/bar
It's not a beautiful solution, but it's only temporary, right?

It's generally not the best idea to bundle some dependency with your project. Some projects do it anyway, or bundle it as an alternative if there's no system package available. (This is mostly found in C projects, not Python.)
You didn't mention what the "latest" means exactly. Is this the latest in pypi?
The best way to make sure a specific version, or greater than a baseline version, of a package is installed is to properly specify the requirement in setup.py requires section. Read more about requires here [1]. This way pip can take care of resolving dependencies, and if it's available in pypi it will be automatic.
[1] http://docs.python.org/2/distutils/setupscript.html#relationships-between-distributions-and-packages

State of Python Packaging: Buildout, Distribute, Distutils, EasyInstall, etc

The last time I had to worry about installing Python packages was two years ago working with Enthought, NumPy and MayaVi2. That experience gave me lingering nightmares related to quirky behavior installing & updating Python packages in non-standard locations (in $HOME/usr/local2.6/, for example).
Anyway, my work is taking me back to installing various Python packages. The CheeseShop Tutorial mentions DistUtils and EasyInstall in addition to Buildout! I am having a hard time finding one place that compares these (and other) PyPi installation tools, so I am hoping to tap into the StackOverflow community: What are the strengths & weaknesses of each installation tool?

First of all, regardless of installation tool you decide on, start using virtualenv --no-site-packages! That way, python packages are not installed globally and you can easily get back to where you were in old as well as new projects.
Now, your comparison is a little bit apples-to-pears as the tools you list are not mutually exclusive. However, I can wholly recommend Buildout. It will install python packages as well as other stuff and lets you automate installation and deployment of (complex) projects.
Also, I recommend looking into Fabric as a means to automate administrative tasks.

I've done quiet a bit of research on this topic(a couple of weeks worth) before settling down on using buildout for all of my projects.
DistUtils and EasyInstall in addition to Buildout!
The difficulty in creating one place to compare all of these tools is that they're all part of a same tool chain and are used together to create a predictable, reliable and flexible tool set.
For example, easy_install is used to install distutils packages from pypi(cheeseshop) to your system Python's site-packages directory. This drastically simplifies installation of packages to your system/global sys.path.
easy_install is very convenient for packages that are consistent for all projects. But, I find that I prefer to use system's easy_install to install packages that projects do not depend on. For example, github-cli I use with every project, because it allows me to interact with project's Github Issues from command line. I use this with projects, but it's for convenience and the project itself does not have dependancy on this package.
For managing project's dependancies, I use buildout. Buildout allows you to indicate specifically what version of packages your project depends on. I prefer buildout over pip-requirements.txt because buildout is declarative. With pip, you install the packages and at the end of the development you generate the requirements.txt file. With Buildout on the other hand, you modify the buildout.cfg before the package egg is added to your project. This forces me to be conscious of what packages I'm adding to the project.
Now, there is a matter of virtualenv. One of the most publicized features of virtualenv is obviously --no-site-packages option. I have not found that option to be particularly useful, because I use buildout. Buildout manages the sys.path and includes only the packages I ask tell it to include. It also, includes everything in system Python's site-packages but since I don't have anything there that I use in projects, I never have conflicts.
Also, I find that --no-site-packages only hinders my development process, because some packages I install using my sistem's packaging system. Usually, anything that has C libraries that need to be compiled, I install through the system's packaging system.
In the project's fabfile.py I include test function to test for presence of system packages that I install through system's package manager.
In summary, here is how I use these tools:
System's Package Manager(apt-get, yam, port, fink ...)
I use one of these to install python versions that I need on this system. I also use it to install packages like lxml which include c libraries.
easy_install
I use to install packages from pypi that I use on all projects, but projects are not dependant on these packages.
buildout
I use to manage dependancies of a project.
In my experience, this workflow has been very flexible, portable and easy to work with.

Distribute is a new fork of setuptools (easy_install), which should also be considered. Even Guido recommends it.
Buildout is orthogonal to the packaging --- you can use buildout with distribute.

Whenever I need to remind myself of the state of play, I look at these as a starting point:
The State of Python Packaging, a response to:
On packaging, linked from:
Tools of the Modern Python Hacker

I can't easily help you with finding the strength, but I can make it a bit harder, since it also depends on the platform you want to use.
For example if you need to install python packages on Gentoo (GNU/Liunx) based computers, you can easily use g-pypi to create ebuilds for all packages which use distutils (rather: a setup.py). That way they get completely integrated into your system and can be added, updated and removed like all your other tools. But it naturally only works for Gentoo-based systems.
Also you can use yolk to find out about all packages installed via easy_install on your system (not only on Gentoo).
When I write code, I simply use distutils (because it allows building portage ebuilds very easily) and sometimes basic setuptools features, or organize my programs so people can just download and run them from the program folder (ideally just unpack the source archive / clone the repository somewhere). This isn't the perfect solution, but until the core python team decides which way they want to move, I don't want to fix onto a path (anymore) which might disappear.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.