Suppose I have the following PyPIs:
public PyPi (standard packages)
gitlab pypi (because internal team ABC wanted to use this)
artifactory PyPi (because contractor team DEF wanted to use this)
Now suppose package titled "ABC" exists on all of them, but are not the same thing (for instance, "apples," which are 3 entirely different packages on all pypis.). How do I do something in my requirements and setup.py to map the package name to the pypi to use?
Something like:
package_def==1.2.3 --index-url=artifactory
apples==1.08 --index-url=gitlab # NOT FROM PUBLIC OR FROM ARTIFACTORY
package_abc==1.2.3 --index-url=artifactory
package_efg==1.0.0 # public pypi
I don't even know how I'd configure the setup.py in this instance either.
I really don't want multiple requirements.txt with different index urls at the top. I also don't want --extra-index-url due to the vulnerabilities it could introduce when using a private pypi.
I tried googling around, messing around with the order of requirements.txt, breaking it up into different files, etc. No luck. Seems that the last --index-url is always used to install all packages.
Any ideas?
The question gets back to the idea that a package dependency specification usually is a state of need that is independent of how that need should be satisfied.
So the dependency declaration “foo==1.0.0” (the thing declared as part of the package metadata) means “I need the package named foo with version 1.0.0" and that is in principle implementation independent. You can install that package with pip from PyPI, but you could also use a different tool and/or different source to satisfy that requirement (e.g. conda, installation-from-source, etc.).
This distinction is the reason why there's no good way to do this.
There are a few work arounds:
You can specify the full link to a wheel you want to pip install
You can use an alternative tool like Poetry, which does support this a little more cleanly.
For my particular usecase, I just listed the full link to the wheel I wanted to pip install, since upgrading to poetry is out of scope at the moment.
Related
In the python project I work on at my workplace, we install some packages from PyPI, and some private company packages from Gemfury, using a standard requirements file.
After reading this article: https://medium.com/#alex.birsan/dependency-confusion-4a5d60fec610.
Our requirement file looks something like:
--index-url <OUR_GEMFURY_URL>
--extra-index-url https://pypi.python.org/simple
aiohttp==3.7.1
simplejson==3.17.1
<our-package>==1.0.0
<our-other-package>==1.2.0
I tried reading some of pip's documentation but I wasn't able to fully understand how it chooses from where to download the package.
For example, what happens if someone uploads a malicious version 1.0.0 to pypi-prod - how does pip know which one of the packages to take?
Is there maybe a way to specify to pip for a specific package to only search for it in --index-url?
How do you protect against dependency confusion in your code?
Thanks for the help!
The article mentions the algorithm pip uses:
Checks whether library exists on the specified (internal) package
index
Checks whether library exists on the public package index (PyPI)
Installs whichever version is found. If the package exists on both, it defaults to installing from the source with the higher version number.
So if your script requires <our-other-package>>=1.2.0, you can get some mailicios package from public pypi server if it has higher than the version you intented to install.
The straightforward solution mentioned in the article is removing --extra-index-url
If package 1.0 is internal or external package and is present in private pypi server it will be downloaded from there.
External packages will be downloaded from public pypi server through internal pypi server which will cache them for future usage.
I'd also suggest to have explicit versions in requirements.txt, this way you are aware of versions you get and do conscious upgrades by increasing the versions.
To sum up the guidelines (which by no means exhaustive and protect against all possible security holes)
remove --extra-index-url https://pypi.python.org/simple from pip.conf, requirements.txt and automation scripts.
specify explicit versions of internal and external packages in requirements.txt
I'm managing a python project which can be released in two different variants, "full" and "lightweight", called e.g. my-project and my-project-lw. Both use the same top-level name, e.g. myproject. I have a script that cuts off the "heavy" parts of the project and builds both wheel installable archives with dependencies (the lightweight has considerably fewer). Everything works, and I can install them using the wheels.
Now I would like to make sure that a user wouldn't have both packages installed at the same time. Ideally I'd like pip to uninstall one when installing the other, or at least fail when the other is present (such that the user would have to uninstall the current manually).
Otherwise, when you install the my-project package it installs into /lib/python3.6/site-packages/myproject, and then when you install the my-project-lw package it overwrites the files in the same folder so you get a weird hybrid when some file are from "full" and some from "lightweigth", which is not good.
Is there a way to specify an anti-dependency? To mark them somehow as mutually exclusive? Thanks!
Pip doesn't support it. See also the related 'obsoletes' metadata. https://github.com/pypa/packaging-problems/issues/154
Is there anyway through either pip or PyPi to identify which projects (published on Pypi) might be using my packages (also published on PyPi) - I would like to identify the user base for each package and possible attempt to actively engage with them.
Thanks in advance for any answers - even if what I am trying to do isn't possible.
This is not really possible. There is no easily accessible public dataset that lets you produce a dependency graph.
At most you could scan all publicly available packages to parse their dependencies, but even then those dependencies are generated by running the setup.py script, so the dependencies can be set dynamically. It is quite common to adjust the dependencies based on the Python version (installing a backport of a standard-library dependency on older Python versions, for example). People have done this before but this is not an easy, lightweight task.
Note that even then, you'll only find publicly declared dependencies. Any dependencies declared by private packages, not published to PyPI or another public repository, can't be accounted for.
I have a private library called some-library (actual names have been changed) with a setup file looking somewhat like this:
setup(
name='some-library',
// Omitted some less important stuff here...
install_requires=[
'some-git-dependency',
'another-git-dependency',
],
dependency_links=[
'git+ssh://git#github.com/my-organization/some-git-dependency.git#egg=some-git-dependency',
'git+ssh://git#github.com/my-organization/another-git-dependency.git#egg=another-git-dependency',
],
)
All of these Git dependencies may be private, so installation via HTTP is not an option. I can use python setup.py install and python setup.py develop in some-library's root directory without problems.
However, installing over Git doesn't work:
pip install -vvv -e 'git+ssh://git#github.com/my-organization/some-library.git#1.4.4#egg=some-library'
The command fails when it looks for some-git-dependency, mistakenly assumes it needs to get the dependency from PyPI and then fails after concluding it's not on PyPI. My first guess was to try re-running the command with --process-dependency-links, but then this happened:
Cannot look at git URL git+ssh://git#github.com/my-organization/some-git-dependency.git#egg=some-git-dependency
Could not find a version that satisfies the requirement some-git-dependency (from some-library) (from versions: )
Why is it producing this vague error? What's the proper way to pip install a package with Git dependencies that might be private?
What's the proper way to pip install a package with Git dependencies that might be private?
Two options
Use dependency_links as you do. See below for details.
Along side the dependency_links in your setup.py's, use a special dependency-links.txt that collects all the required packages. Then add this package in requirements.txt. That's my recommendend option as explained below.
# dependency-links.txt
git+ssh://...#tag#egg=package-name1
git+ssh://...#tag#egg=package-name2
# requirements.txt (per deployed application)
-r dependency-links.txt
While option 2 adds some extra burden on package management, namely keeping dependency-links.txt up to date, it makes installing packages a lot easier because you can' forget to add the --process-dependency-link option on pip install.
Perhaps more importantly, using dependency-links.txt you get to specify the exact version to be installed on deployment, which is want you want in a CI/CD environment - nothing is more risky than to install some version. From a package maintainer's perspective however it is common and considered good practice to specify a minimum version, such as
# setup.py in a package
...
install_requires = [ 'foo>1.0', ... ]
That's great because it makes your packages work nicely with other packages that have similar dependencies yet possibly on different versions. However, in a deployed application this can still cause mayhem if there are conflicting requirements among packages. E.g. package A is ok with foo>1.0, package B wants foo<=1.5 and the most recent version is foo==2.0. Using dependency-links.txt you can be specific, applying one version for all packages:
# dependency-links.txt
foo==1.5
The command fails when it looks for some-git-dependency,
To make it work, you need to add --process-dependency-links for pip to recognize the dependency to github, e.g.
pip install --process-dependency-links -r private-requirements.txt
Note since pip 8.1.0 you can add this option to requirements.txt. On the downside it gets applied to all packages installed and may have unintended consequences. That said, I find using dependency-links.txt is a safer and more manageable solution.
All of these Git dependencies may be private
There are three options:
Add collaborators on each of the required packages' repositories. These collaborators need to have their ssh keys setup with github for this to work. Then use git+ssh://...
Add a deploy key to each of the repositories. The downside here is that you need to distribute the corresponding private key to all the machines that need to deploy. Again use git+ssh://...
Add a personal access token on the github account that holds the private repositories. Then you can use git+https://accesstoken#github.com/... The downside is that the access token will have read + write access to all repositories, public and private, on the respective github account. On the plus side distributing and managing per-repository private keys is no longer necessary, and cycling the key is a lot simpler. In an all-inhouse environment where every dev has access to all repositories I have found this to be the most efficient, hassle-free way for everybody. YMMV
This should work for private repositories as well:
dependency_links = [
'git+ssh://git#github.com/my-organization/some-git-dependency.git#master#egg=some-git-dependency',
'git+ssh://git#github.com/my-organization/another-git-dependency.git#master#egg=another-git-dependency'
],
You should use git+git when url with #egg, like this:
-e git+git#repo.some.la:foo/my-repo.git#egg=my-repo
Use git+ssh in production without #egg, but you can specify #version or branch #master
git+ssh://git#repo.some.la/foo/my-repo.git#1.1.6
for work with app versions use git tagging Git Basics - Tagging
If I refer to "pip install dependency links", you would not refer to the GitHub repo itself, but to the tarball image associated to that GitHub repo:
dependency_links=[
'git+ssh://git#github.com/my-organization/some-git-dependency/tarball/master/#egg=some-git-dependency',
'git+ssh://git#github.com/my-organization/another-git-dependency/tarball/master/#egg=another-git-dependency',
],
with "some-git-dependency" being the name *and version of the dependency.
"Cannot look at git URL git+ssh://git#github.com/my-organization/some-git-dependency.git#egg=some-git-dependency" means pip cannot fetch an html page from this url to look for direct download links in the page, i.e, pip doesn't recognize the URL as a vcs checkout, because maybe some discrepancy between the requirement specifier and the fragment part in the vcs url.
In the case of a VCS checkout, you should also append #egg=project-version in order to identify for what package that checkout should be used.
Be sure to escape any dashes in the name or version by replacing them with underscores.
Check Dependencies that aren’t in PyPI
replace - with an _ in the package and version string.
git+ssh://git#github.com/my-organization/some-git-dependency.git#egg=some_git_dependency
and --allow-all-external may be useful.
I have packages A and B, both have their own git repository, PyPI page, etc... Package A depends on package B, and by using the install_requires keyword I can get A to automatically download and install B.
But suppose I want to go a step further for my especially non-savvy users; I want to actually include package B within the tar/zip for package A, so no download is necessary (this also lets them potentially make any by-hand edits to package B's setup.cfg)
Is there a suggested (ideally automated) way to,
Include B in A when I call sdist for A
Tell setuptools that B is bundled with A for resolving the dependency (something like a local dependency_links)
Thanks!
This is called 'vendorizing' and no, there is no support for such an approach.
It is also a bad idea; you want to leave installation to the specialized tools, which not only manage dependencies but also what versions are installed. With tools like buildout or pip's requirements.txt format you can control very precisely what versions are being used.
By bundling a version of a dependency inside, you either force upon your users what version they get to use, or make it harder for such tools to ensure the used versions for a given installation are consistent. In addition, you are potentially wasting bandwidth and space; if other packages have also included the same requirements, you now have multiple copies. And if your dependency is updated to fix a critical security issue, you have to re-release any package that bundles it.
In the past, some packages did use vendorize packaging to include a dependency into their distribution. requests was a prime example; they stepped away from this approach because it complicated their release process. Every time there was a bug in one of the vendored packages they had to produce a new release to include the fix, for example.
If you do want to persist in including packages, you'll have to write your own support. I believe requests manually just added the vendorized packages to their repository; so they kept a static copy they updated from time to time. Alternatively, you could extend setup.py to download code when you are creating a distribution.