What is most secure way to use `pip` to maintain Python packages?

What is most secure way to use `pip` to maintain Python packages? - python

I'm perplexed about how best to use pip in the face of security concerns about malicious packages or install scripts. I'm not much of a security expert, so I may just be confused (bear with me), but it seems that there are 4, possibly overlapping, approaches:
(1) Use sudo pip for everything
This is how I do things now. I generally do not need virtualenvs and like the convenience of having all my packages work for all my tools. I also don't install a lot of experimental packages, sticking pretty much to the well-known and widely used ones (matplotlib, six, etc).
I gather this can be a risky approach though because the installation process has su privileges, and could potentially do anything; however it has the advantage of protecting the site-packages directory from subsequent mischief by anything (not just packages) running as non-su after an install.
This approach also can't be completely avoided, as some packages (pip itself) need it to bootstrap any Python installation.
(2) Create a pip user and give it ownership of site-packages
This would seem to have the advantage of restricting what pip can do: all it can do is install to site-packages. But I'm not sure about side effects, or if it would even work (when, for example pip needs to put things in other locations). A more realistic variant of this is to set things up this way, and use pip as "pip-user" when it works, and as su when it doesn't.
(3) Give myself ownership of site-packages
I gather this is a very had idea, but I'm not sure quite why. It would mean that any code I run would be able to tamper with site-packages; but it would mean that malicious install scripts could only damage things I can damage myself anyway.
(4) "Use a virtualenv"
This suggestion comes up a lot, but I don't see how it helps. It seems no different from 3 to me since it creates a site-packages that I own.
Which, if any of these approaches, or combinations of approaches, is best for ensuring that pip does not result in exposing my system? My concern is mostly with my system as a whole, and only secondarily with my Python installation in site-packages (which I can always rebuild if need be).
Part of the problem I have, is that a don't know how to weigh the risks. An example approach, that seems to make sense to my limited understanding is simply to do (1) for the most part, and use a virtualenv (4) for any package that I worry might damage my site-packages. Anything I've installed will still be able to damage anything I have access to, but that seems unavoidable, and at least things I don't have access to will be safe (except during the installation process itself). But I have trouble evaluating whether the protection this affords is worth the risk it creates.

You probably want to look at using a virtualenv. To quote the docs:
Virtualenv is a tool to create isolated Python environments. The basic problem
being addressed is one of dependencies and versions, and indirectly
permissions.
Virtualenv will create a folder with an isolated copy of python, an isolated pip and an isolated site-packages. You're thinking that this is the same as option 3 because you're taking that advice you linked at face value and not reading into it:
If you give yourself write privilege to the system site-packages,
you're risking that any program that runs under you (not necessarily
python program) can inject malicious code into the system
site-packages and obtain root privilege.
The problem is not with having access to site-packages (you have to have privilages for site-packages to be able to do anything). The problem is with having access to the system site-packages. A virtual environment's site-packages does not expose root privilages to malicious code the same as the one that your entire system is using.
However, I see nothing wrong with using sudo pip for well known and familiar packages. At the end of the day, it's like installing any other program, even non-python. If you go to its website and it looks honest and you trust it, there's no reason not to sudo.
moreover, pip is pretty safe - it uses https for pypi and if you --allow-external it will download packages from third-party, but will still keep checksums on pypi and compare them. For third-party with no checksum you need to explicitly call --allow-unverified which is the only option considered unsafe.
As a personal note, I can add that I use sudo pip most of the times, but as a WEB developer virtualenv is kind of a day-to-day thing, and I can recommend using it as well (especially if you see anything sketchy but you still want to try it out).

Related

Is it a good practice to upgrade all python packages in production to their latest versions?

I am running a fairly complex Django application, for about a year now. It has about 50 packages in requirements.txt
Whenever I need a new package, I install it with pip, and then manually add it in the requirements.txt file with a fixed version:
SomeNewModule==1.2.3
That means that most of my packages are now outdated after a year. I have updated couple of them manually when I specifically needed a new feature.
I am beginning to think that there might be security patches that I am missing out, but I am reluctant to update them all blindly, due to backward incompatibility.
Is there a standard best practice for this?

The common pattern for versioning python modules (and many other software) is major.minor.patch
where after the initial release, patch versions don't change the api, minor releases may change the api in a backward compatible way, and major releases usually aren't backward compatible
so if you have module==x.y.z a relatively safe requirement specification would be:
module>=x.y.z,<x.y+1.0
note that while this will usually be OK, it's based on common practices, and not guaranteed to work, and it's more likely to be stable with more "organized" modules

Do not update packages in production, you can brake things, if you have a package which has tables in database, and you update it you can brake your database. I used for example python social auth, I wanted to upgrade it to the last version, so for that, I need it to upgrade do version x, run migrations and after that got to last version and migrate.
Upgrade the packages in your development environment, test it. If you have a pre-production, do that there after testing in dev.

One of the most robust practices in the industry is to ensure that your code is well unit-tested, well property-tested, and well regression-tested. Once you've got good coverage and the tests are green, begin upgrading your dependencies. You might want to do this one at a time if you're feeling ambitious, or you might give it all a go and try your luck. If after upgrading your tests are still green and you can manually go through all workflows, chances are you're in the clear! Those tests you spent time on can then be leveraged going forward for any smaller, incremental upgrades.

It depends. Most of the time, updating the packages for minor version changes (eg: 2.45.1 -> 2.56.1) is unlikely to break your system. However, it is still advisable to run extensive regression testing. However, major version changes(eg. 2.45.1 to 3.13.0) should definitely be avoided as most of them have little backward compatibility. An example of this would be the selenium web driver which in version 3.0 cannot run without geckodriver compared to version 2.56. Regardless, extensive unit and regressive testing should be carried out on the old and new code to ensure there are no unexpected changes, especially at the corner cases.
Considering that you have mentioned that you have mentioned that you only use a single python pip, I can say that this will be an issue for future programs as you may need to use different versions of the same package for different projects.
An acceptable practice to avoid such issues would be to use a virtual environment for each project. Just create a makefile in your project root and use:
virtualenv --python=${PYTHON} env
env/bin/pip install --upgrade pip
env/bin/pip --upgrade setuptools wheel pip-tools
Then have env/bin/pip install the requirements from your requirements.txt file.

First, a little background on me. I'm a security test engineer and work heavily on Python, that being said most of my work is test code and the products I typically work on do not ship with Python running on the box so I cannot give you direct answers to questions about specific Python packages. I can give you some rules of thumb about what most companies do when faced with such questions in a general sense.
Figure out if you need to update:
Run a security scanner (like Nessus). If you have any glaring version problems with your application stack or hosts Nessus will give you some idea of things that should be fixed immediately
Check the release notes for all of the packages to see if anything actually needs updating. Pay extra close attention to any package that opens a socket. If there are no security fixes it probably isn't worth your time to update. In some cases even if there is a security problem in a package, it may be something you aren't using so pay attention to the problem description. If you have an application in production now, typically the goal is to change as little as possible to limit the amount of code update and testing that needs to be done.
If you've discovered that you need an update to start resolving dependencies:
Probably the easiest way to do this is to set up a test environment and start installing the updated versions that you need. Once more, try to make a few changes as possible while installing your code in the test environment. In Python, there is a reliance for C libraries for much of the security stuff, In your test environment be sure to have the same lower level system libs as well...
Create a list of changes that need to be made and start devising a test plan for the sections of code that you need to replace.
Once your development environment is set up, test test test. You'll probably want to rerun your security scanner and be assured to exercise all of the new code.

You are absolutely right. There will be backward incompatibility issue. Do not update packages to the latest blindly. Most likely, you will have package/module/class/variable/key undefined/notFound issues. Especially you have a complex system. Even if you use pip install --upgrade somepackage
This is lesson from my real experience.

Build and use Python without make install

I just downloaded Python sources, unpacked them to /usr/local/src/Python-3.5.1/, run ./configure and make there. Now, according to documentation, I should run make install.
But I don't want to install it somewhere in common system folders, create any links, change or add environment variables, doing anything outside this folder. In other words, I want it to be portable. How do I do it? Will /usr/local/src/Python-3.5.1/python get-pip.py install Pip to /usr/local/src/Python-3.5.1/Lib/site-packages/? Will /usr/local/src/Python-3.5.1/python work properly?
make altinstall, as I understand, still creates links what is not desired. Is it correct that it creates symbolic links as well but simply doesn't touch /usr/bin/python and man?
Probably, I should do ./configure prefix=some/private/path and just make and make install but I still wonder if it's possible to use Python make install.

If you don't want to copy the binaries you built into a shared location for system-wide use, you should not make install at all. If the build was successful, it will have produced binaries you can run. You may need to set up an environment for making them use local run-time files instead of the system-wide ones, but this is a common enough requirement for developers that it will often be documented in a README or similar (though as always when dealing with development sources, be prepared that it might not be as meticulously kept up to date as end-user documentation in a released version).

How do you cleanly remove Python when it was installed with 'make altinstall'?

How do you cleanly remove Python when it was installed with make altinstall? I'm not finding an altuninstall or such in the makefile, nor does this seem to be a common question.
In this case I'm working with Python 2.7.x in Ubuntu, but I expect the answer would apply to earlier and later versions/sub-versions.
Why? I'm doing build tests of various Python sub-versions and would like to do those tests cleanly, so that there are no "leftovers" from other versions. I could wipe out everything in /usr/local/lib and /usr/local/bin but there may be other things there I'd like not to remove, so having a straightforward way to isolate the Python components for removal would be ideal.

As far as I know, there’s no automatic way to do this. But you can go into /usr/local and delete bin/pythonX and lib/pythonX (and maybe lib64/pythonX).
But more generally, why bother? The whole point of altinstall is that many versions can live together. So you don't need to delete them.
For your tests, what you should do is use virtualenv to create a new, clean environment, with whichever python version you want to use. That lets you keep all your altinstalled Python versions and still have a clean environment for tests.
Also, do the same (use virtualenv) for development. Then your altinstall’ed Pytons don't have site packages. They just stay as clean, pristine references.

Managing many Python projects/virtualenvs

In my workplace, I have to manage many (currently tens, but probably hundreds eventually) Python web applications, potentially running various frameworks, libraries, etc (all at various versions). Virtualenv has been a lifesaver in managing that so far, but I'd still like to be able to manage things better, particularly when it comes to managing package upgrades.
I've thought of a few scenarios
Option 1:
Install all required modules for each project in each virtualenv using pip, upgrade each individually as necessary. This would require a significant time cost for each upgrade and would require additional documentation to keep track of things. Might be facilitated by some management scripting.
Option 2:
Install all libraries used by any application in a central repository, use symlinks to easily change versions once for all projects. Easy upgrades and central management, but forgoes some of the nicest benefits of using virtualenv in the first place.
Option 3:
Hybridize the above two somehow, centralizing the most common libraries and/or those likely to need upgrades and installing the rest locally to each virtualenv.
Does anyone else have a similar situation? What's the best way to handle this?

You might consider using zc.buildout. It's more annoying to set up than plain pip/virtualenv, but it gives you more opportunities for automation. If the disk space usage isn't an issue, I'd suggest you just keep using individual environments for each project so you can upgrade them one at a time.

We have a requirements.pip file at our project root that contains the packages for pip to install, so upgrading automatically is relatively easy. I'm not sure symlinking would solve the issue - it will make it harder to make upgrades to a subset of your projects. If diskspace isn't an issue, and you can write some simple scripts to list and upgrade packages, I'd stick with virtualenv as-is.

Is site-packages appropriate for applications or just libraries?

I'm in a bit of a discussion with some other developers on an open source project. I'm new to python but it seems to me that site-packages is meant for libraries and not end user applications. Is that true or is site-packages an appropriate place to install an application meant to be run by an end user?

Once you get to the point where your application is ready for distribution, package it up for your favorite distributions/OSes in a way that puts your library code in site-packages and executable scripts on the system path.
Until then (i.e. for all development work), don't do any of the above: save yourself major headaches and use zc.buildout or virtualenv to keep your development code (and, if you like, its dependencies as well) isolated from the rest of the system.

We do it like this.
Most stuff we download is in site-packages. They come from pypi or Source Forge or some other external source; they are easy to rebuild; they're highly reused; they don't change much.
Must stuff we write is in other locations (usually under /opt, or c:\opt) AND is included in the PYTHONPATH.
There's no great reason for keeping our stuff out of site-packages. However, our feeble excuse is that our stuff changes a lot. Pretty much constantly. To reinstall in site-packages every time we think we have something better is a bit of a pain.
Since we're testing out of our working directories or SVN checkout directories, our test environments make heavy use of PYTHONPATH.
The development use of PYTHONPATH bled over into production. We use a setup.py for production installs, but install to an alternate home under /opt and set the PYTHONPATH to include /opt/ourapp-1.1.

The program run by the end user is usually somewhere in their path, with most of the code in the module directory, which is often in site-packages.
Many python programs will have a small script located in the path, which imports the module, and calls a "main" method to run the program. This allows the programmer to do some upfront checks, and possibly modify sys.path if needed to find the needed module. This can also speed up load time on larger programs, because only files that are imported will be run from bytecode.

Site-packages is for libraries, definitely.
A hybrid approach might work: you can install the libraries required by your application in site-packages and then install the main module elsewhere.

If you can turn part of the application to a library and provide an API, then site-packages is a good place for it. This is actually how many python applications do it.
But from user or administrator point of view that isn't actually the problem. The problem is how we can manage the installed stuff. After I have installed it, how can I upgrade and uninstall it?
I use Fedora. If I use the python that came with it, I don't like installing things to site-packages outside the RPM system. In some cases I have built rpm myself to install it.
If I build my own python outside RPM, then I naturally want to use python's mechanisms to manage it.
Third way is to use something like easy_install to install such thing for example as a user to home directory.
So
Allow packaging to distributions.
Allow selecting the python to use.
Allow using python installed by distribution where you don't have permissions to site-packages.
Allow using python installed outside distribution where you can use site-packages.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.