We are maintaining a private PyPi repository on a GitLab instance in which we upload hundreds of Python wheels for different platforms, architectures and Python versions in order to use them internally.
As the amount of packages grows, we only want to upload packages that do not already exist in the repository and thus, want to to check what's already available in advance. We have looked into pip search but it only seems to work for single packages.
Is there any way to check for all available packages on a package repository including their versions, architectures, etc.?
I was hoping for something like pip list but for the repository.
In the python project I work on at my workplace, we install some packages from PyPI, and some private company packages from Gemfury, using a standard requirements file.
After reading this article: https://medium.com/#alex.birsan/dependency-confusion-4a5d60fec610.
Our requirement file looks something like:
--index-url <OUR_GEMFURY_URL>
--extra-index-url https://pypi.python.org/simple
aiohttp==3.7.1
simplejson==3.17.1
<our-package>==1.0.0
<our-other-package>==1.2.0
I tried reading some of pip's documentation but I wasn't able to fully understand how it chooses from where to download the package.
For example, what happens if someone uploads a malicious version 1.0.0 to pypi-prod - how does pip know which one of the packages to take?
Is there maybe a way to specify to pip for a specific package to only search for it in --index-url?
How do you protect against dependency confusion in your code?
Thanks for the help!
The article mentions the algorithm pip uses:
Checks whether library exists on the specified (internal) package
index
Checks whether library exists on the public package index (PyPI)
Installs whichever version is found. If the package exists on both, it defaults to installing from the source with the higher version number.
So if your script requires <our-other-package>>=1.2.0, you can get some mailicios package from public pypi server if it has higher than the version you intented to install.
The straightforward solution mentioned in the article is removing --extra-index-url
If package 1.0 is internal or external package and is present in private pypi server it will be downloaded from there.
External packages will be downloaded from public pypi server through internal pypi server which will cache them for future usage.
I'd also suggest to have explicit versions in requirements.txt, this way you are aware of versions you get and do conscious upgrades by increasing the versions.
To sum up the guidelines (which by no means exhaustive and protect against all possible security holes)
remove --extra-index-url https://pypi.python.org/simple from pip.conf, requirements.txt and automation scripts.
specify explicit versions of internal and external packages in requirements.txt
I work on a system and the hosting guys don't want to use an install script that uses pip. Now we have a large pip requirements file that install the dependencies. Is there any other way to do it than using pip? Can it be done using yum or apt-get ? We are using Linux.
For god's sake, please do not fall back to using the distribution's package manager just because your hosting guys do not understand what pip+virtualenv is good for.
Python packages in Linux distribution repositories are often outdated and may come with quirks that other Python package authors did not plan for. This is especially true for Python packages with compiled code. If a documentation tells you that a certain dependency should be obtained directly from PyPI via pip, then you better follow that requirement. Convince your hosting guys to use the right tools, namely pip combined with virtualenv. The latter will create an isolated environment and make sure that the system will stay clean (really, nobody needs to do a sudo pip install, which probably is the thing your hosting guys are afraid of).
I need to install python on a sever to run scripts but the server has no access to the internet.
The server has access to a local network that has access to the internet*. I would like to use pip to manage the packages through a local network directory as specified here.
How can I install pip, python and their dependancies on a windows machine, offline so that I can use pip, as specified in the link above to manage the packages I require?
*For Clarity: I have no ability to mirror, hack or otherwise to get information to pass through the local network directly from the internet.
The official Python installer for Windows has no other dependencies. It runs completely offline.
For other packages that may have dependencies (that are difficult to install on Windows); Christopher Gholke maintains a list of Windows installers for common Python packages. These are msi installers (or whl files) that are self-contained.
They are designed to work with the official Python installer for Windows - as they use its registry entries to identify the install location.
You can download these and move them to your Windows machine.
Beyond those two - if you have further requirements you can use tools like basket to download packages and then provide the location as a source for offline pip installs; or create your own pip repository.
If you do decide to create a local pip repository, it is better to create a pip proxy (see pypicache for example) this way you are only requesting those packages that are required, rather than trying to mirror the entire cheeseshop.
Does Python have a package/module management system, similar to how Ruby has rubygems where you can do gem install packagename?
On Installing Python Modules, I only see references to python setup.py install, but that requires you to find the package first.
Recent progress
March 2014: Good news! Python 3.4 ships with Pip. Pip has long been Python's de-facto standard package manager. You can install a package like this:
pip install httpie
Wahey! This is the best feature of any Python release. It makes the community's wealth of libraries accessible to everyone. Newbies are no longer excluded from using community libraries by the prohibitive difficulty of setup.
However, there remains a number of outstanding frustrations with the Python packaging experience. Cumulatively, they make Python very unwelcoming for newbies. Also, the long history of neglect (ie. not shipping with a package manager for 14 years from Python 2.0 to Python 3.3) did damage to the community. I describe both below.
Outstanding frustrations
It's important to understand that while experienced users are able to work around these frustrations, they are significant barriers to people new to Python. In fact, the difficulty and general user-unfriendliness is likely to deter many of them.
PyPI website is counter-helpful
Every language with a package manager has an official (or quasi-official) repository for the community to download and publish packages. Python has the Python Package Index, PyPI. https://pypi.python.org/pypi
Let's compare its pages with those of RubyGems and Npm (the Node package manager).
https://rubygems.org/gems/rails RubyGems page for the package rails
https://www.npmjs.org/package/express Npm page for the package express
https://pypi.python.org/pypi/simplejson/ PyPI page for the package simplejson
You'll see the RubyGems and Npm pages both begin with a one-line description of the package, then large friendly instructions how to install it.
Meanwhile, woe to any hapless Python user who naively browses to PyPI. On https://pypi.python.org/pypi/simplejson/ , they'll find no such helpful instructions. There is however, a large green 'Download' link. It's not unreasonable to follow it. Aha, they click! Their browser downloads a .tar.gz file. Many Windows users can't even open it, but if they persevere they may eventually extract it, then run setup.py and eventually with the help of Google setup.py install. Some will give up and reinvent the wheel..
Of course, all of this is wrong. The easiest way to install a package is with a Pip command. But PyPI didn't even mention Pip. Instead, it led them down an archaic and tedious path.
Error: Unable to find vcvarsall.bat
Numpy is one of Python's most popular libraries. Try to install it with Pip, you get this cryptic error message:
Error: Unable to find vcvarsall.bat
Trying to fix that is one of the most popular questions on Stack Overflow: "error: Unable to find vcvarsall.bat"
Few people succeed.
For comparison, in the same situation, Ruby prints this message, which explains what's going on and how to fix it:
Please update your PATH to include build tools or download the DevKit from http://rubyinstaller.org/downloads and follow the instructions at http://github.com/oneclick/rubyinstaller/wiki/Development-Kit
Publishing packages is hard
Ruby and Nodejs ship with full-featured package managers, Gem (since 2007) and Npm (since 2011), and have nurtured sharing communities centred around GitHub. Npm makes publishing packages as easy as installing them, it already has 64k packages. RubyGems lists 72k packages. The venerable Python package index lists only 41k.
History
Flying in the face of its "batteries included" motto, Python shipped without a package manager until 2014.
Until Pip, the de facto standard was a command easy_install. It was woefully inadequate. The was no command to uninstall packages.
Pip was a massive improvement. It had most the features of Ruby's Gem. Unfortunately, Pip was--until recently--ironically difficult to install. In fact, the problem remains a top Python question on Stack Overflow: "How do I install pip on Windows?"
And just to provide a contrast, there's also pip.
The Python Package Index (PyPI) seems to be standard:
To install a package:
pip install MyProject
To update a package
pip install --upgrade MyProject
To fix a version of a package pip install MyProject==1.0
You can install the package manager as follows:
curl -O http://python-distribute.org/distribute_setup.py
python distribute_setup.py
easy_install pip
References:
http://guide.python-distribute.org/
http://pypi.python.org/pypi/distribute
As a Ruby and Perl developer and learning-Python guy, I haven't found easy_install or pip to be the equivalent to RubyGems or CPAN.
I tend to keep my development systems running the latest versions of modules as the developers update them, and freeze my production systems at set versions. Both RubyGems and CPAN make it easy to find modules by listing what's available, then install and later update them individually or in bulk if desired.
easy_install and pip make it easy to install a module ONCE I located it via a browser search or learned about it by some other means, but they won't tell me what is available. I can explicitly name the module to be updated, but the apps won't tell me what has been updated nor will they update everything in bulk if I want.
So, the basic functionality is there in pip and easy_install but there are features missing that I'd like to see that would make them friendlier and easier to use and on par with CPAN and RubyGems.
There are at least two, easy_install and its successor pip.
As of at least late 2014, Continuum Analytics' Anaconda Python distribution with the conda package manager should be considered. It solves most of the serious issues people run into with Python in general (managing different Python versions, updating Python versions, package management, virtual environments, Windows/Mac compatibility) in one cohesive download.
It enables you to do pretty much everything you could want to with Python without having to change the system at all. My next preferred solution is pip + virtualenv, but you either have to install virtualenv into your system Python (and your system Python may not be the version you want), or build from source. Anaconda makes this whole process the click of a button, as well as adding a bunch of other features.
That'd be easy_install.
It's called setuptools. You run it with the "easy_install" command.
You can find the directory at http://pypi.python.org/
I don't see either MacPorts or Homebrew mentioned in other answers here, but since I do see them mentioned elsewhere on Stack Overflow for related questions, I'll add my own US$0.02 that many folks seem to consider MacPorts as not only a package manager for packages in general (as of today they list 16311 packages/ports, 2931 matching "python", albeit only for Macs), but also as a decent (maybe better) package manager for Python packages/modules:
Question
"...what is the method that Mac python developers use to manage their modules?"
Answers
"MacPorts is perfect for Python on the Mac."
"The best way is to use MacPorts."
"I prefer MacPorts..."
"With my MacPorts setup..."
"I use MacPorts to install ... third-party modules tracked by MacPorts"
SciPy
"Macs (unlike Linux) don’t come with a package manager, but there are a couple of popular package managers you can install.
Macports..."
I'm still debating on whether or not to use MacPorts myself, but at the moment I'm leaning in that direction.
On Windows install http://chocolatey.org/ then
choco install python
Open a new cmd-window with the updated PATH. Next, do
choco install pip
After that you can
pip install pyside
pip install ipython
...
Since no one has mentioned pipenv here, I would like to describe my views why everyone should use it for managing python packages.
As #ColonelPanic mentioned there are several issues with the Python Package Index and with pip and virtualenv also.
Pipenv solves most of the issues with pip and provides additional features also.
Pipenv features
Pipenv is intended to replace pip and virtualenv, which means pipenv will automatically create a separate virtual environment for every project thus avoiding conflicts between different python versions/package versions for different projects.
Enables truly deterministic builds, while easily specifying only what you want.
Generates and checks file hashes for locked dependencies.
Automatically install required Pythons, if pyenv is available.
Automatically finds your project home, recursively, by looking for a Pipfile.
Automatically generates a Pipfile, if one doesn’t exist.
Automatically creates a virtualenv in a standard location.
Automatically adds/removes packages to a Pipfile when they are un/installed.
Automatically loads .env files, if they exist.
If you have worked on python projects before, you would realize these features make managing packages way easier.
Other Commands
check checks for security vulnerabilities and asserts that PEP 508 requirements are being met by the current environment. (which I think is a great feature especially after this - Malicious packages on PyPi)
graph will show you a dependency graph, of your installed dependencies.
You can read more about it here - Pipenv.
Installation
You can find the installation documentation here
P.S.: If you liked working with the Python Package requests , you would be pleased to know that pipenv is by the same developer Kenneth Reitz
In 2019 poetry is the package and dependency manager you are looking for.
https://github.com/sdispater/poetry#why
It's modern, simple and reliable.
Poetry is what you're looking for. It takes care of dependency management, virtual environments, running.