I have a virtualenv that serves a few separate projects. I'd like to write a utility library that each of these projects can use. It seems to make sense to stick that in the virtualenv. By all means shoot me down now but please give an alternative if you do.
Assuming I'm not completely crazy though, Where's the best place to stick my library?
My virtualenv sticks everything I install with pip in lib/pyton2.7/site-packages. I wonder if it would make more sense to follow suit or to hack in a separate home (further up the directory tree) so if things do ever clash, my work isn't overwritten by pip (et al).
If your project follows the standard packaging practices with setuptools, then all you have to do is run python setup.py develop inside the virtualenvs that you want the library to be used for. A .egg-link file will be created pointing to your package from which your other libraries will use much like any other packages, with the added benefit that your latest changes will be available to all packages at the same time (if that's your intention). If not, then either you could call python setup.py install or use multiple versions at different locations on the file system.
To get yourself started, take a look at Getting Started With setuptools and setup.py (you can skip the part on registering your package on pypi if this is your private work).
Another relevant stackoverflow thread: setup.py examples? The Hitchhiker's Guide to Packaging can also be quite useful.
Related
I'm a Java/Scala dev transitioning to Python for a work project. To dust off the cobwebs on the Python side of my brain, I wrote a webapp that acts as a front-end for Docker when doing local Docker work. I'm now working on packaging it up and, as such, am learning about setup.py and virtualenv. Coming from the JVM world, where dependencies aren't "installed" so much as downloaded to a repository and referenced when needed, the way pip handles things is a bit foreign. It seems like best practice for production Python work is to first create a virtual environment for your project, do your coding work, then package it up with setup.py.
My question is, what happens on the other end when someone needs to install what I've written? They too will have to create a virtual environment for the package but won't know how to set it up without inspecting the setup.py file to figure out what version of Python to use, etc. Is there a way for me to create a setup.py file that also creates the appropriate virtual environment as part of the install process? If not — or if that's considered a "no" as this respondent stated to this SO post — what is considered "best practice" in this situation?
You can think of virtualenv as an isolation for every package you install using pip. It is a simple way to handle different versions of python and packages. For instance you have two projects which use same packages but different versions of them. So, by using virtualenv you can isolate those two projects and install different version of packages separately, not on your working system.
Now, let's say, you want work on a project with your friend. In order to have the same packages installed you have to share somehow what versions and which packages your project depends on. If you are delivering a reusable package (a library) then you need to distribute it and here where setup.py helps. You can learn more in Quick Start
However, if you work on a web site, all you need is to put libraries versions into a separate file. Best practice is to create separate requirements for tests, development and production. In order to see the format of the file - write pip freeze. You will be presented with a list of packages installed on the system (or in the virtualenv) right now. Put it into the file and you can install it later on another pc, with completely clear virtualenv using pip install -r development.txt
And one more thing, please do not put strict versions of packages like pip freeze shows, most of time you want >= at least X.X version. And good news here is that pip handles dependencies by its own. It means you do not have to put dependent packages there, pip will sort it out.
Talking about deploy, you may want to check tox, a tool for managing virtualenvs. It helps a lot with deploy.
Python default package path always point to system environment, that need Administrator access to install. Virtualenv able to localised the installation to an isolated environment.
For deployment/distribution of package, you can choose to
Distribute by source code. User need to run python setup.py --install, or
Pack your python package and upload to Pypi or custom Devpi. So the user can simply use pip install <yourpackage>
However, as you notice the issue on top : without virtualenv, they user need administrator access to install any python package.
In addition, the Pypi package worlds contains a certain amount of badly tested package that doesn't work out of the box.
Note : virtualenv itself is actually a hack to achieve isolation.
I had my fair chance of getting through the python management of modules, and every time is a challenge: packaging is not what people do every day, and it becomes a burden to learn, and a burden to remember, even when you actually do it, since this happens normally once.
I would like to collect here the definitive overview of how import, package management and distribution works in python, so that this question becomes the definitive explanation for all the magic that happens under the hood. Although I understand the broad level of the question, these things are so intertwined that any focused answer will not solve the main problem: understand how all works, what is outdated, what is current, what are just alternatives for the same task, what are the quirks.
The list of keywords to refer to is the following, but this is just a sample out of the bunch. There's a lot more and you are welcome to add additional details.
PyPI
setuptools / Distribute
distutils
eggs
egg-link
pip
zipimport
site.py
site-packages
.pth files
virtualenv
handling of compiled modules in eggs (with and without installation via easy_install)
use of get_data()
pypm
bento
PEP 376
the cheese shop
eggsecutable
Linking to other answers is probably a good idea. As I said, this question is for the high-level overview.
For the most part, this is an attempt to look at the packaging/distribution side, not the mechanics of import. Unfortunately, packaging is the place where Python provides way more than one way to do it. I'm just trying to get the ball rolling, hopefully others will help fill what I miss or point out mistakes.
First of all there's some messy terminology here. A directory containing an __init__.py file is a package. However, most of what we're talking about here are specific versions of packages published on PyPI, one of it's mirrors, or in a vendor specific package management system like Debian's Apt, Redhat's Yum, Fink, Macports, Homebrew, or ActiveState's pypm.
These published packages are what folks are trying to call "Distributions" going forward in an attempt to use "Package" only as the Python language construct. You can see some of that usage in PEP-376 PEP-376.
Now, your list of keywords relate to several different aspects of the Python Ecosystem:
Finding and publishing python distributions:
PyPI (aka the cheese shop)
PyPI Mirrors
Various package management tools / systems: apt, yum, fink, macports, homebrew
pypm (ActiveState's alternative to PyPI)
The above are all services that provide a place to publish Python distributions in various formats. Some, like PyPI mirrors and apt / yum repositories can be run on your local machine or within your companies network but folks typically use the official ones. Most, if not all provide a tool (or multiple tools in the case of PyPI) to help find and download distributions.
Libraries used to create and install distributions:
setuptools / Distribute
distutils
Distutils is the standard infrastructure on which Python packages are compiled and built into distributions. There's a ton of functionality in distutils but most folks just know:
from distutils.core import setup
setup(name='Distutils',
version='1.0',
description='Python Distribution Utilities',
author='Greg Ward',
author_email='gward#python.net',
url='http://www.python.org/sigs/distutils-sig/',
packages=['distutils', 'distutils.command'],
)
And to some extent that's a most of what you need. With the prior 9 lines of code you have enough information to install a pure Python package and also the minimal metadata required to publish that package a distribution on PyPI.
Setuptools provides the hooks necessary to support the Egg format and all of it's features and foibles. Distribute is an alternative to Setuptools that adds some features while trying to be mostly backwards compatible. I believe Distribute is going to be included in Python 3 as the successor to Distutil's from distutils.core import setup.
Both Setuptools and Distribute provide a custom version of the distutils setup command
that does useful things like support the Egg format.
Python Distribution Formats:
source
eggs
Distributions are typically provided either as source archives (tarball or zipfile). The standard way to install a source distribution is by downloading and uncompressing the archive and then running the setup.py file inside.
For example, the following will download, build, and install the Pygments syntax highlighting library:
curl -O -G http://pypi.python.org/packages/source/P/Pygments/Pygments-1.4.tar.gz
tar -zxvf Pygments-1.4.tar.gz
cd Pygments-1.4
python setup.py build
sudo python setup.py install
Alternatively you can download the Egg file and install it. Typically this is accomplished by using easy_install or pip:
sudo easy_install pygments
or
sudo pip install pygments
Eggs were inspired by Java's Jarfiles and they have quite a few features you should read about here
Python Package Formats:
uncompressed directories
zipimport (zip compressed directories)
A normal python package is just a directory containing an __init__.py file and an arbitrary number of additional modules or sub-packages. Python also has support for finding and loading source code within *.zip files as long as they are included on the PYTHONPATH (sys.path).
Installing Python Packages:
easy_install: the original egg installation tool, depends on setuptools
pip: currently the most popular way to install python packages. Similar to easy_install but more flexible and has some nice features like requirements files to help document dependencies and reproduce deployments.
pypm, apt, yum, fink, etc
Environment Management / Automated Deployment:
bento
buildout
virtualenv (and virtualenvwrapper)
The above tools are used to help automate and manage dependencies for a Python project. Basically they give you tools to describe what distributions your application requires and automate the installation of those specific versions of your dependencies.
Locations of Packages / Distributions:
site-packages
PYTHONPATH
the current working directory (depends on your OS and environment settings)
By default, installing a python distribution is going to drop it into the site-packages directory. That directory is usually something like /usr/lib/pythonX.Y/site-packages.
A simple programmatic way to find your site-packages directory:
from distuils import sysconfig
print sysconfig.get_python_lib()
Ways to modify your PYTHONPATH:
Python's import statement will only find packages that are located in one of the directories included in your PYTHONPATH.
You can inspect and change your path from within Python by accessing:
import sys
print sys.path
sys.path.append("/home/myname/lib")
Besides that, you can set the PYTHONPATH environment variable like you would any other environment variable on your OS or you could use:
.pth files: *.pth files located in directories that are already on your PYTHONPATH are read and each line of the *.pth file is added to your PYTHONPATH. Basically any time you would copy a package into a directory on your PYTHONPATH you could instead create a mypackages.pth. Read more about *.pth files: site module
egg-link files: Internal structure of python eggs they are a cross platform alternative to symbolic links. Creating an egg link file is similar to creating a pth file.
site.py modifications
To add the above /home/myname/lib to site-packages with a *.pth file you'd create a *.pth file. The name of the file doesn't matter but you should still probably choose something sensible.
Let's create myname.pth:
# myname.pth
/home/myname/lib
That's it. Drop that into sysconfig.get_python_lib() on your system or any other directory in your PYTHONPATH and /home/myname/lib will be added to the path.
For packaging question, this should help http://guide.python-distribute.org/
For import, the old article from Fredrik Lundh http://effbot.org/zone/import-confusion.htm still a very good starting point.
I recommend Tarek Ziadek's Book on Python. There's a chapter dedicated to packaging and distribution.
I don't think import needs to be explored (Python's namespacing and importing functionality is intuitive IMHO).
I use pip exclusively now. I haven't run into any issues with it.
However, the topic of packaging and distribution is something worth exploring. Instead of giving a lengthy answer, I will say this:
I learned how to package and distribute my own "packages" by simply copying how Pylons or many other open-source packages do it. I then combined that sort-of template with reading up of the docs to flesh it out even further and have come up with a solid distribution method.
When you grok package management and distribution for python (distutils and pypi) it's actually quite powerful. I like it a lot.
[edit]
I also wanted to add in a bit about virtualenv. USE IT. I create a virtualenv for every project and I always use --no-site-packages; I install all the packages I need for that particular project (even if it's something common amongst them all, like lxml) inside the virtualev. It keeps everything isolated and it's much easier for me to maintain the grouping in my head (rather than trying to keep track of what's where and for which version of python!)
[/edit]
The last time I had to worry about installing Python packages was two years ago working with Enthought, NumPy and MayaVi2. That experience gave me lingering nightmares related to quirky behavior installing & updating Python packages in non-standard locations (in $HOME/usr/local2.6/, for example).
Anyway, my work is taking me back to installing various Python packages. The CheeseShop Tutorial mentions DistUtils and EasyInstall in addition to Buildout! I am having a hard time finding one place that compares these (and other) PyPi installation tools, so I am hoping to tap into the StackOverflow community: What are the strengths & weaknesses of each installation tool?
First of all, regardless of installation tool you decide on, start using virtualenv --no-site-packages! That way, python packages are not installed globally and you can easily get back to where you were in old as well as new projects.
Now, your comparison is a little bit apples-to-pears as the tools you list are not mutually exclusive. However, I can wholly recommend Buildout. It will install python packages as well as other stuff and lets you automate installation and deployment of (complex) projects.
Also, I recommend looking into Fabric as a means to automate administrative tasks.
I've done quiet a bit of research on this topic(a couple of weeks worth) before settling down on using buildout for all of my projects.
DistUtils and EasyInstall in addition to Buildout!
The difficulty in creating one place to compare all of these tools is that they're all part of a same tool chain and are used together to create a predictable, reliable and flexible tool set.
For example, easy_install is used to install distutils packages from pypi(cheeseshop) to your system Python's site-packages directory. This drastically simplifies installation of packages to your system/global sys.path.
easy_install is very convenient for packages that are consistent for all projects. But, I find that I prefer to use system's easy_install to install packages that projects do not depend on. For example, github-cli I use with every project, because it allows me to interact with project's Github Issues from command line. I use this with projects, but it's for convenience and the project itself does not have dependancy on this package.
For managing project's dependancies, I use buildout. Buildout allows you to indicate specifically what version of packages your project depends on. I prefer buildout over pip-requirements.txt because buildout is declarative. With pip, you install the packages and at the end of the development you generate the requirements.txt file. With Buildout on the other hand, you modify the buildout.cfg before the package egg is added to your project. This forces me to be conscious of what packages I'm adding to the project.
Now, there is a matter of virtualenv. One of the most publicized features of virtualenv is obviously --no-site-packages option. I have not found that option to be particularly useful, because I use buildout. Buildout manages the sys.path and includes only the packages I ask tell it to include. It also, includes everything in system Python's site-packages but since I don't have anything there that I use in projects, I never have conflicts.
Also, I find that --no-site-packages only hinders my development process, because some packages I install using my sistem's packaging system. Usually, anything that has C libraries that need to be compiled, I install through the system's packaging system.
In the project's fabfile.py I include test function to test for presence of system packages that I install through system's package manager.
In summary, here is how I use these tools:
System's Package Manager(apt-get, yam, port, fink ...)
I use one of these to install python versions that I need on this system. I also use it to install packages like lxml which include c libraries.
easy_install
I use to install packages from pypi that I use on all projects, but projects are not dependant on these packages.
buildout
I use to manage dependancies of a project.
In my experience, this workflow has been very flexible, portable and easy to work with.
Distribute is a new fork of setuptools (easy_install), which should also be considered. Even Guido recommends it.
Buildout is orthogonal to the packaging --- you can use buildout with distribute.
Whenever I need to remind myself of the state of play, I look at these as a starting point:
The State of Python Packaging, a response to:
On packaging, linked from:
Tools of the Modern Python Hacker
I can't easily help you with finding the strength, but I can make it a bit harder, since it also depends on the platform you want to use.
For example if you need to install python packages on Gentoo (GNU/Liunx) based computers, you can easily use g-pypi to create ebuilds for all packages which use distutils (rather: a setup.py). That way they get completely integrated into your system and can be added, updated and removed like all your other tools. But it naturally only works for Gentoo-based systems.
Also you can use yolk to find out about all packages installed via easy_install on your system (not only on Gentoo).
When I write code, I simply use distutils (because it allows building portage ebuilds very easily) and sometimes basic setuptools features, or organize my programs so people can just download and run them from the program folder (ideally just unpack the source archive / clone the repository somewhere). This isn't the perfect solution, but until the core python team decides which way they want to move, I don't want to fix onto a path (anymore) which might disappear.
I have an open source project containing both Python and C code. I'm wondering that is there any use for distutils for me, because I'm planning to do a ubuntu/debian package. The C code is not something that I could or want to use as Python extension. C and Python programs communicate with TCP/IP through localhost.
So the bottom line here is that while I'm learning packaging, does the usage of distutils specific files only make me more confused since I can't use my C-code as Python extensions? Or should I divide my C and Python functionality to separate projects to be able to understand packaging concepts better?
distutils can be used to install end user programs, but it's most useful when using it for Python libraries, as it can create source packages and also install them in the correct place. For that I would say it's more or less required.
But for an end user Python program you can also use make or whatever you like and are used to, as you don't need to install any code in the Python site-packages directory, and you don't need to put your code onto PyPI and it doesn't need to be accessible from other Python-code.
I don't think distutils will be neither more or less complicated to use in installing an end-user program compared to other tools. All such install/packaging tools are hella-complex, as Cartman would have said.
Because it uses an unified python setup.py install command? distutils, or setuptools? Whatever, just use one of those.
For development, it's also really useful because you don't have to care where to find such and such dependency. As long as it's standard Python/basic system library stuff, setup.py should find it for you. With setup.py, you don't require anymore ./configure stuff or ugly autotools to create huge Makefiles. It just works (tm)
I've seen a good bit of setuptools bashing on the internets lately. Most recently, I read James Bennett's On packaging post on why no one should be using setuptools. From my time in #python on Freenode, I know that there are a few souls there who absolutely detest it. I would count myself among them, but I do actually use it.
I've used setuptools for enough projects to be aware of its deficiencies, and I would prefer something better. I don't particularly like the egg format and how it's deployed. With all of setuptools' problems, I haven't found a better alternative.
My understanding of tools like pip is that it's meant to be an easy_install replacement (not setuptools). In fact, pip uses some setuptools components, right?
Most of my packages make use of a setuptools-aware setup.py, which declares all of the dependencies. When they're ready, I'll build an sdist, bdist, and bdist_egg, and upload them to pypi.
If I wanted to switch to using pip, what kind of changes would I need to make to rid myself of easy_install dependencies? Where are the dependencies declared? I'm guessing that I would need to get away from using the egg format, and provide just source distributions. If so, how do i generate the egg-info directories? or do I even need to?
How would this change my usage of virtualenv? Doesn't virtualenv use easy_install to manage the environments?
How would this change my usage of the setuptools provided "develop" command? Should I not use that? What's the alternative?
I'm basically trying to get a picture of what my development workflow will look like.
Before anyone suggests it, I'm not looking for an OS-dependent solution. I'm mainly concerned with debian linux, but deb packages are not an option, for the reasons Ian Bicking outlines here.
pip uses Setuptools, and doesn't require any changes to packages. It actually installs packages with Setuptools, using:
python -c 'import setuptools; __file__="setup.py"; execfile(__file__)' \
install \
--single-version-externally-managed
Because it uses that option (--single-version-externally-managed) it doesn't ever install eggs as zip files, doesn't support multiple simultaneously installed versions of software, and the packages are installed flat (like python setup.py install works if you use only distutils). Egg metadata is still installed. pip also, like easy_install, downloads and installs all the requirements of a package.
In addition you can also use a requirements file to add other packages that should be installed in a batch, and to make version requirements more exact (without putting those exact requirements in your setup.py files). But if you don't make requirements files then you'd use it just like easy_install.
For your install_requires I don't recommend any changes, unless you have been trying to create very exact requirements there that are known to be good. I think there's a limit to how exact you can usefully be in setup.py files about versions, because you can't really know what the future compatibility of new libraries will be like, and I don't recommend you try to predict this. Requirement files are an alternate place to lay out conservative version requirements.
You can still use python setup.py develop, and in fact if you do pip install -e svn+http://mysite/svn/Project/trunk#egg=Project it will check that out (into src/project) and run setup.py develop on it. So that workflow isn't any different really.
If you run pip verbosely (like pip install -vv) you'll see a lot of the commands that are run, and you'll probably recognize most of them.
I'm writing this in April 2014. Be conscious of the date on anything written about Python packaging, distribution or installation. It looks like there's been some lessening of factiousness, improvement in implementations, PEP-standardizing and unifying of fronts in the last, say, three years.
For instance, the Python Packaging Authority is "a working group that maintains many of the relevant projects in Python packaging."
The python.org Python Packaging User Guide has Tool Recommendations and The Future of Python Packaging sections.
distribute was a branch of setuptools that was remerged in June 2013. The guide says, "Use setuptools to define projects and create Source Distributions."
As of PEP 453 and Python 3.4, the guide recommends, "Use pip to install Python packages from PyPI," and pip is included with Python 3.4 and installed in virtualenvs by pyvenv, which is also included. You might find the PEP 453 "rationale" section interesting.
There are also new and newish tools mentioned in the guide, including wheel and buildout.
I'm glad I read both of the following technical/semi-political histories.
By Martijn Faassen in 2009: A History of Python Packaging.
And by Armin Ronacher in June 2013 (the title is not serious): Python Packaging: Hate, hate, hate everywhere.
For starters, pip is really new. New, incomplete and largely un-tested in the real world.
It shows great promise but until such time as it can do everything that easy_install/setuptools can do it's not likely to catch on in a big way, certainly not in the corporation.
Easy_install/setuptools is big and complex - and that offends a lot of people. Unfortunately there's a really good reason for that complexity which is that it caters for a huge number of different use-cases. My own is supporting a large ( > 300 ) pool of desktop users, plus a similar sized grid with a frequently updated application. The notion that we could do this by allowing every user to install from source is ludicrous - eggs have proved themselves a reliable way to distribute my project.
My advice: Learn to use setuptools - it's really a wonderful thing. Most of the people who hate it do not understand it, or simply do not have the use-case for as full-featured distribution system.
:-)