Python package import subpackage - good practice?

Python package import subpackage - good practice? - python

My package has a dependency on the latest version of the jsonpickle package. Older versions are installable via pip, however I need the latest version (i.e on Github) for it to work. Is it generally considered OK in this situation to bundle the latest version of jsonpickle in my code? Is there some other solution? I'd rather not ask my users not to clone from github.
I'm thinking of organising my package like this:
My package
|
__init__.py
file1.py
file2.py
\
jsonpickle (latest)
i.e Doing what was done here: Python: importing a sub‑package or sub‑module

As kag says, this is generally not a good idea. It's not that it's "frowned upon" as being unfriendly to the other packages, it's that it causes maintenance burdens for you and your users. (Imagine that there's a bug that's fixed in jsonpickle that affects your users, but you haven't picked up the fix yet. If you'd done things normally, all they'd have to do is upgrade jsonpickle, but if you're using an internal copy, they have to download the jsonpickle source and yours, hack up your package, and install it all manually.)
Sometimes, it's still worth doing. For example, the very popular requests module includes its own copy of other packages like urllib3. And yes, it does face both of the costs described above. But it also means that each version of request can rely on an exact specific version of urllib3. Since requests makes heavy use of urllib3's rarely-used interfaces, and even has workarounds for some of its known bugs, that can be valuable.
In your case, that doesn't sound like the issue. You just need a bleeding-edge version of jsonpickle temporarily, until the upstream maintainers upload a new version to PyPI. The problem isn't that you don't want your users all having different versions; it's that you don't want to force them to clone the repo and figure out how to install it manually. Fortunately, pip takes care of that for you, by wrapping most of the difficulties up in one line:
pip install git+https://github.com/foo/bar
It's not a beautiful solution, but it's only temporary, right?

It's generally not the best idea to bundle some dependency with your project. Some projects do it anyway, or bundle it as an alternative if there's no system package available. (This is mostly found in C projects, not Python.)
You didn't mention what the "latest" means exactly. Is this the latest in pypi?
The best way to make sure a specific version, or greater than a baseline version, of a package is installed is to properly specify the requirement in setup.py requires section. Read more about requires here [1]. This way pip can take care of resolving dependencies, and if it's available in pypi it will be automatic.
[1] http://docs.python.org/2/distutils/setupscript.html#relationships-between-distributions-and-packages

Related

Per line index url in requirements.txt

Suppose I have the following PyPIs:
public PyPi (standard packages)
gitlab pypi (because internal team ABC wanted to use this)
artifactory PyPi (because contractor team DEF wanted to use this)
Now suppose package titled "ABC" exists on all of them, but are not the same thing (for instance, "apples," which are 3 entirely different packages on all pypis.). How do I do something in my requirements and setup.py to map the package name to the pypi to use?
Something like:
package_def==1.2.3 --index-url=artifactory
apples==1.08 --index-url=gitlab # NOT FROM PUBLIC OR FROM ARTIFACTORY
package_abc==1.2.3 --index-url=artifactory
package_efg==1.0.0 # public pypi
I don't even know how I'd configure the setup.py in this instance either.
I really don't want multiple requirements.txt with different index urls at the top. I also don't want --extra-index-url due to the vulnerabilities it could introduce when using a private pypi.
I tried googling around, messing around with the order of requirements.txt, breaking it up into different files, etc. No luck. Seems that the last --index-url is always used to install all packages.
Any ideas?

The question gets back to the idea that a package dependency specification usually is a state of need that is independent of how that need should be satisfied.
So the dependency declaration “foo==1.0.0” (the thing declared as part of the package metadata) means “I need the package named foo with version 1.0.0" and that is in principle implementation independent. You can install that package with pip from PyPI, but you could also use a different tool and/or different source to satisfy that requirement (e.g. conda, installation-from-source, etc.).
This distinction is the reason why there's no good way to do this.
There are a few work arounds:
You can specify the full link to a wheel you want to pip install
You can use an alternative tool like Poetry, which does support this a little more cleanly.
For my particular usecase, I just listed the full link to the wheel I wanted to pip install, since upgrading to poetry is out of scope at the moment.

Add PYPI Package to Distribution

Just a heads up that this may be an obvious question. I'm writing a package that will be generally distributed and I don't want to have to do any support in the future (don't ask). It relies on python's standard library with one exception. If that one exception gets removed from PYPI in the future, I don't want to have to update my code.
So my question is: can I include the package I downloaded from PYPI within my package so it always exists in its current state and users don't have to download it separately? If so do I just move the package from my sys.path to my package?
Thank you, and sorry if it's an obvious question.

In short - yes, you can. However it's not particularly necessary because pip supports specifying the needed version in the setup.py and it will take care of the installation of the package.

Bundle tweepy with GitHub project

I have a Python project stored in GitHub that is going to use tweepy, a Twitter Python library stored in GitHub too. In the INSTALL file says that I can use git to bundle it in my project but I have no idea on how to do this. I thought to use git submodule but this will fetch all the project, not only the sources that is what I need. How could I do it? Is there a best way to work with external libraries in a project?

You can't get just a subdirectory with git-submodule (see this question).
Bundling is also bad practice (it makes updating the library code in the case of a library bug more difficult, and results in useless duplication). tweepy is on PyPI, so the best way would not be bundling it, but to require it in your setup.py or list it in your requirements.txt (depends on whether you're packaging it for PyPI or just building need to have an easy way to install your projects dependencies). You can specify what versions of the library are permissible when doing so, so tweepy can't change out from under your feet.
If you're planning on making changes to tweepy, the polite thing would be to submit those changes to the project, rather than making the changes within your project (or perhaps maintaining your own fork on GitHub if tweepy won't accept the changes). Note that pip has ways to install git versions of a package if you need a version that's not available in PyPI.

Is there a way to bundle multiple packages together with Python setuptools?

I have packages A and B, both have their own git repository, PyPI page, etc... Package A depends on package B, and by using the install_requires keyword I can get A to automatically download and install B.
But suppose I want to go a step further for my especially non-savvy users; I want to actually include package B within the tar/zip for package A, so no download is necessary (this also lets them potentially make any by-hand edits to package B's setup.cfg)
Is there a suggested (ideally automated) way to,
Include B in A when I call sdist for A
Tell setuptools that B is bundled with A for resolving the dependency (something like a local dependency_links)
Thanks!

This is called 'vendorizing' and no, there is no support for such an approach.
It is also a bad idea; you want to leave installation to the specialized tools, which not only manage dependencies but also what versions are installed. With tools like buildout or pip's requirements.txt format you can control very precisely what versions are being used.
By bundling a version of a dependency inside, you either force upon your users what version they get to use, or make it harder for such tools to ensure the used versions for a given installation are consistent. In addition, you are potentially wasting bandwidth and space; if other packages have also included the same requirements, you now have multiple copies. And if your dependency is updated to fix a critical security issue, you have to re-release any package that bundles it.
In the past, some packages did use vendorize packaging to include a dependency into their distribution. requests was a prime example; they stepped away from this approach because it complicated their release process. Every time there was a bug in one of the vendored packages they had to produce a new release to include the fix, for example.
If you do want to persist in including packages, you'll have to write your own support. I believe requests manually just added the vendorized packages to their repository; so they kept a static copy they updated from time to time. Alternatively, you could extend setup.py to download code when you are creating a distribution.

Questions about Setuptools and alternatives

I've seen a good bit of setuptools bashing on the internets lately. Most recently, I read James Bennett's On packaging post on why no one should be using setuptools. From my time in #python on Freenode, I know that there are a few souls there who absolutely detest it. I would count myself among them, but I do actually use it.
I've used setuptools for enough projects to be aware of its deficiencies, and I would prefer something better. I don't particularly like the egg format and how it's deployed. With all of setuptools' problems, I haven't found a better alternative.
My understanding of tools like pip is that it's meant to be an easy_install replacement (not setuptools). In fact, pip uses some setuptools components, right?
Most of my packages make use of a setuptools-aware setup.py, which declares all of the dependencies. When they're ready, I'll build an sdist, bdist, and bdist_egg, and upload them to pypi.
If I wanted to switch to using pip, what kind of changes would I need to make to rid myself of easy_install dependencies? Where are the dependencies declared? I'm guessing that I would need to get away from using the egg format, and provide just source distributions. If so, how do i generate the egg-info directories? or do I even need to?
How would this change my usage of virtualenv? Doesn't virtualenv use easy_install to manage the environments?
How would this change my usage of the setuptools provided "develop" command? Should I not use that? What's the alternative?
I'm basically trying to get a picture of what my development workflow will look like.
Before anyone suggests it, I'm not looking for an OS-dependent solution. I'm mainly concerned with debian linux, but deb packages are not an option, for the reasons Ian Bicking outlines here.

pip uses Setuptools, and doesn't require any changes to packages. It actually installs packages with Setuptools, using:
python -c 'import setuptools; __file__="setup.py"; execfile(__file__)' \
install \
--single-version-externally-managed
Because it uses that option (--single-version-externally-managed) it doesn't ever install eggs as zip files, doesn't support multiple simultaneously installed versions of software, and the packages are installed flat (like python setup.py install works if you use only distutils). Egg metadata is still installed. pip also, like easy_install, downloads and installs all the requirements of a package.
In addition you can also use a requirements file to add other packages that should be installed in a batch, and to make version requirements more exact (without putting those exact requirements in your setup.py files). But if you don't make requirements files then you'd use it just like easy_install.
For your install_requires I don't recommend any changes, unless you have been trying to create very exact requirements there that are known to be good. I think there's a limit to how exact you can usefully be in setup.py files about versions, because you can't really know what the future compatibility of new libraries will be like, and I don't recommend you try to predict this. Requirement files are an alternate place to lay out conservative version requirements.
You can still use python setup.py develop, and in fact if you do pip install -e svn+http://mysite/svn/Project/trunk#egg=Project it will check that out (into src/project) and run setup.py develop on it. So that workflow isn't any different really.
If you run pip verbosely (like pip install -vv) you'll see a lot of the commands that are run, and you'll probably recognize most of them.

I'm writing this in April 2014. Be conscious of the date on anything written about Python packaging, distribution or installation. It looks like there's been some lessening of factiousness, improvement in implementations, PEP-standardizing and unifying of fronts in the last, say, three years.
For instance, the Python Packaging Authority is "a working group that maintains many of the relevant projects in Python packaging."
The python.org Python Packaging User Guide has Tool Recommendations and The Future of Python Packaging sections.
distribute was a branch of setuptools that was remerged in June 2013. The guide says, "Use setuptools to define projects and create Source Distributions."
As of PEP 453 and Python 3.4, the guide recommends, "Use pip to install Python packages from PyPI," and pip is included with Python 3.4 and installed in virtualenvs by pyvenv, which is also included. You might find the PEP 453 "rationale" section interesting.
There are also new and newish tools mentioned in the guide, including wheel and buildout.
I'm glad I read both of the following technical/semi-political histories.
By Martijn Faassen in 2009: A History of Python Packaging.
And by Armin Ronacher in June 2013 (the title is not serious): Python Packaging: Hate, hate, hate everywhere.

For starters, pip is really new. New, incomplete and largely un-tested in the real world.
It shows great promise but until such time as it can do everything that easy_install/setuptools can do it's not likely to catch on in a big way, certainly not in the corporation.
Easy_install/setuptools is big and complex - and that offends a lot of people. Unfortunately there's a really good reason for that complexity which is that it caters for a huge number of different use-cases. My own is supporting a large ( > 300 ) pool of desktop users, plus a similar sized grid with a frequently updated application. The notion that we could do this by allowing every user to install from source is ludicrous - eggs have proved themselves a reliable way to distribute my project.
My advice: Learn to use setuptools - it's really a wonderful thing. Most of the people who hate it do not understand it, or simply do not have the use-case for as full-featured distribution system.
:-)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.