Add PYPI Package to Distribution

Add PYPI Package to Distribution - python

Just a heads up that this may be an obvious question. I'm writing a package that will be generally distributed and I don't want to have to do any support in the future (don't ask). It relies on python's standard library with one exception. If that one exception gets removed from PYPI in the future, I don't want to have to update my code.
So my question is: can I include the package I downloaded from PYPI within my package so it always exists in its current state and users don't have to download it separately? If so do I just move the package from my sys.path to my package?
Thank you, and sorry if it's an obvious question.

In short - yes, you can. However it's not particularly necessary because pip supports specifying the needed version in the setup.py and it will take care of the installation of the package.

Related

Python setup config install_requires "good practices"

My question here may seem really naive but I never found any clue about it on web resources.
The question is, concerning install_requires argument for setup() function or setup.cfg file, is it a good practice to mention every package used, even python built-in ones such as os for example ?
One can assume that any python environment has those common packages, so is it problematic to explicitly mention them in the setup, making it potentially over-verbose ?
Thanks

install_requires should include non-standard library requirements, and constraints on their versions (as needed).
For example, this would declare minimal versions for numpy and scipy, but allow any version of scikit-learn:
setup(
# ...
install_requires=["numpy>=1.13.3", "scipy>=0.19.1", "scikit-learn"]
)
Packages such as os, sys are part of Python's standard library, so should not be included.
As #sinoroc mentioned, only direct 3rd party dependencies should be declared here. Dependencies-of-your-dependencies are handled automatically. (For example, scikit-learn depends on joblib; when the former is required, the latter will be installed).
A list of standard library packages are listed here: https://docs.python.org/3/library/
I've found it helpful to read other packages and see how their setup.py files are defined.
imbalanced-learn
pandas

You should list top-level 3rd party dependencies.
Don't list packages and modules from Python's standard library.
Do list 3rd party dependencies your code directly depends on, i.e. the projects containing:
the packages and modules your code imports;
the binaries your code directly calls (in subprocesses for example).
Don't list dependencies of your dependencies.

install_requires should mention only packages that don't come prepackaged with the standard library.
If you're afraid some package might not be included due to Python versioning, you can specify that your packages requires a python version bigger or equal to X.
Note: That packaging python packages is a notoriously hairy thing.
I'd recommend you take a look at pyproject.toml, these can be pip installed like any normal package and are used by some of the more modern tools like poetry

Is there any reason to list standard library dependencies when using setuptools?

I've gone down the Python packaging and distribution rabbit-hole and am wondering:
Is there ever any reason to provide standard library modules/packages as dependencies to setup() when using setuptools?
And a side note: the only setuptools docs I have found are terrible. Is there anything better?
Thanks

In a word, no. The point of the dependencies is to make sure they are available when someone installs your library. Since standard libraries are always available with a python installation, you do not need to include them.
For a more user-friendly guide to packaging, check out Hynek's guide:
Sharing Your Labor of Love: PyPI Quick and Dirty

No — In fact, you should never specify standard modules as setup() requirements, because those requirements are only for downloading packages from PyPI (or VCS repositories or other package indices). Adding, say, "itertools" to install_requires will mean that your package will fail to install because its dependencies can't be satisfied because there's (currently) no package named "itertools" on PyPI. Some standard modules do share their name with a project on PyPI; in the best case (e.g., argparse), the PyPI project is compatible with the standard module and only exists as a separate project for historical reasons/backwards compatibility. In the worst case... well, use your imagination.

What's a good pip/setuptools compliant version number for a fork of a package?

I am forking a python package, where I expect the package author to merge my changes in the near future. The package author doesn't release very often, so I expect to have my temporary fork as a dependency for some of my other packages. I need to create an appropriate version number for my fork that is pip/setuptools compliant.
Let's say the current version is 1.6.4, and I expect the author's next release to be 1.6.5. Would an appropriate version for the fork be 1.6.4.1 or 1.6.5.dev20140520? Both seem to be compliant with PEP440, but I also have had experience with recent versions of pip not finding dev releases unless you specifically use the pre flag. It seems that 1.6.4.1 would be a good choice, but I don't know how happy pip will be with a N.N.N.N format (e.g. will pip treat it as a pre release?).
Is there some standard convention for this? Note, I don't want to change the name of the author's package, but I do need a temporary fork that my other packages can install with minimal issues.

It seems that there's not an official convention for naming a fork for a python package. As #larsman pointed out in the question comments, a standard convention forking package-1.6.4 is package-1.6.4-forkname-0.1 -- and while this has been used by the Linux community (and others) for years, it has recently lost favor for python packages. One of the main issues is that this convention does not follow accepted versioning conventions used by pip -- and thus has garnered less use in recent years for python packages. If you do a search for "fork" on pypi's package index (https://pypi.python.org/pypi?%3Aaction=search&term=fork&submit=search) you'll see that it seems there are two common pip-compliant cases emerging:
package-forkname-1.6.4
forkname-1.6.4, where forkname is a "clever" variant on packagename (e.g. PIL and pillow)

Python package import subpackage - good practice?

My package has a dependency on the latest version of the jsonpickle package. Older versions are installable via pip, however I need the latest version (i.e on Github) for it to work. Is it generally considered OK in this situation to bundle the latest version of jsonpickle in my code? Is there some other solution? I'd rather not ask my users not to clone from github.
I'm thinking of organising my package like this:
My package
|
__init__.py
file1.py
file2.py
\
jsonpickle (latest)
i.e Doing what was done here: Python: importing a sub‑package or sub‑module

As kag says, this is generally not a good idea. It's not that it's "frowned upon" as being unfriendly to the other packages, it's that it causes maintenance burdens for you and your users. (Imagine that there's a bug that's fixed in jsonpickle that affects your users, but you haven't picked up the fix yet. If you'd done things normally, all they'd have to do is upgrade jsonpickle, but if you're using an internal copy, they have to download the jsonpickle source and yours, hack up your package, and install it all manually.)
Sometimes, it's still worth doing. For example, the very popular requests module includes its own copy of other packages like urllib3. And yes, it does face both of the costs described above. But it also means that each version of request can rely on an exact specific version of urllib3. Since requests makes heavy use of urllib3's rarely-used interfaces, and even has workarounds for some of its known bugs, that can be valuable.
In your case, that doesn't sound like the issue. You just need a bleeding-edge version of jsonpickle temporarily, until the upstream maintainers upload a new version to PyPI. The problem isn't that you don't want your users all having different versions; it's that you don't want to force them to clone the repo and figure out how to install it manually. Fortunately, pip takes care of that for you, by wrapping most of the difficulties up in one line:
pip install git+https://github.com/foo/bar
It's not a beautiful solution, but it's only temporary, right?

It's generally not the best idea to bundle some dependency with your project. Some projects do it anyway, or bundle it as an alternative if there's no system package available. (This is mostly found in C projects, not Python.)
You didn't mention what the "latest" means exactly. Is this the latest in pypi?
The best way to make sure a specific version, or greater than a baseline version, of a package is installed is to properly specify the requirement in setup.py requires section. Read more about requires here [1]. This way pip can take care of resolving dependencies, and if it's available in pypi it will be automatic.
[1] http://docs.python.org/2/distutils/setupscript.html#relationships-between-distributions-and-packages

Is there a way to bundle multiple packages together with Python setuptools?

I have packages A and B, both have their own git repository, PyPI page, etc... Package A depends on package B, and by using the install_requires keyword I can get A to automatically download and install B.
But suppose I want to go a step further for my especially non-savvy users; I want to actually include package B within the tar/zip for package A, so no download is necessary (this also lets them potentially make any by-hand edits to package B's setup.cfg)
Is there a suggested (ideally automated) way to,
Include B in A when I call sdist for A
Tell setuptools that B is bundled with A for resolving the dependency (something like a local dependency_links)
Thanks!

This is called 'vendorizing' and no, there is no support for such an approach.
It is also a bad idea; you want to leave installation to the specialized tools, which not only manage dependencies but also what versions are installed. With tools like buildout or pip's requirements.txt format you can control very precisely what versions are being used.
By bundling a version of a dependency inside, you either force upon your users what version they get to use, or make it harder for such tools to ensure the used versions for a given installation are consistent. In addition, you are potentially wasting bandwidth and space; if other packages have also included the same requirements, you now have multiple copies. And if your dependency is updated to fix a critical security issue, you have to re-release any package that bundles it.
In the past, some packages did use vendorize packaging to include a dependency into their distribution. requests was a prime example; they stepped away from this approach because it complicated their release process. Every time there was a bug in one of the vendored packages they had to produce a new release to include the fix, for example.
If you do want to persist in including packages, you'll have to write your own support. I believe requests manually just added the vendorized packages to their repository; so they kept a static copy they updated from time to time. Alternatively, you could extend setup.py to download code when you are creating a distribution.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.