Why do we need to install python modules

Why do we need to install python modules - python

Lets imagine I created new module. Why do I need to install it via setup file? I mean I can just add my module to PYTHONPATH variable and thats all. Thanks

For a simple one-file module, sure, that's enough.
But a setup.py file also lets you create a distribution, associate metadata with the distribution (author, homepage, description, etc.), register your package with the Python Package Index and most of all, lets you define what other packages might be needed to run your code. setup.py is not just for installing your module.
Installing a module with a setup.py based on setuptools also gives you additional functionality, such as support for namespaced packages (multiple distributions sharing a top-level name) and the ability to install multiple versions of a package side by side.

Related

install a library if necessary

i wanted to send a short code i was proud of to a fellow student and realized he won't be able to run it since there is no reason for him to have installed the library.
installing is of course super easy - but i realized this could happen often, mainly with beginners - wanted to build a simple function for it:
def smart_import(package_name):
try:
import package_name
except ImportError:
import pip
pip.main(['install', package_name])
problem is that i dont really know how to pass in the name of the package as a value that could be called by import
thought of converting a string back but that seems more complicated then i thought

This is a bad idea for many reasons, the main one being that people generally don't expect a Python function call to automatically attempt to install software on their machine.
Here are some other problems this approach has:
The import name is not always corresponding to the distribution name. For example dateutil module is provided by python-dateutil distribution.
If you try to use smart_import from another module, it will leave the name imported into the wrong namespace.
Some distributions export multiple top-level imports, e.g. setuptools provides setuptools, easy_install, and pkg_resources.
In some cases, pip itself may not be installed.
The script might not have permission to install packages.
The user might want to install distributions into their homedir explicitly with --user, and your script itself can't know that.
The invocation of pip might need to change depending on whether or not you are within a virtualenv.
The installation might attempt to pull in dependencies which cause conflicts with other distributions a user has installed.
IDEs likely won't see that a module dynamically imported is there, and may squiggly underline their subsequent use as a name which can not be resolved.
What to do instead:
Package your code up into its own distribution. Write a setup.py which mentions the dependencies using the install_requires argument to in the setup call. Ask your fellow student to pip install mypackage, and the dependencies will be collected at install time rather than at runtime. You can send your distribution directly to them (as a mypackage-0.1.tar.gz file or as a wheel). Or you can upload it to an index such as the test pypi.
The Python Packaging User Guide is a beginner-friendly resource describing how to create and upload your own distribution.

Why install Python packages

Why we have to install the python packages before using them?
I am currently working on a small python mysql program. What i tried to download the python connector module from mysql webpage and simply unzip it and place it in the same folder of my code.
And I can import the module properly.
So what is the meaning of installing those packages? Can I use those packages like matplotlib, numpy without installing them ?
Is it possible to have all the required packages installed on a folder so that i can move it to another computer and run my program with only CPython installed (I don't want to install any package on this computer)?

it's not that simple :-)
some packages have dependencies, you also need to download and extract their dependencies (you need pacakge x,and package x uses y) pakcage manager handles that
some package have some c code (they need to be compiled before use (ujson or postgres module) package manager handles that
when your share your code instead of sharing dependencies you simply add a file containing the list of dependencies (requirements.txt) and other user can simply install all dependencies using package manager

Installing a python package enables us to use it anywhere on our system. If we just place the package in the same directory as our script then it may well work, but only for scripts in that directory.
Some packages also rely on others to function properly, and the installation of a package may well install those pre-requisite packages for you. You may be able to do this manually, but you'd have to put them all in the same directory as your script every time you wanted to run it.
So installing the packages is the easiest way to use them.
You don't have to install them, and in some cases you wouldn't install them on your system; if you had split your code across two files and imported one file at the top of the other for example.

In fact, you don't really need install package on your system.
But if you install it, you can use these packages every where on your system.
Also, you can create a requirement.txt file to enable install all packages that you need on other computer. You can check this manual https://pip.pypa.io/en/stable/user_guide/#requirements-files

requirements.txt vs setup.py

I started working with Python. I've added requirements.txt and setup.py to my project. But, I am still confused about the purpose of both files. I have read that setup.py is designed for redistributable things and that requirements.txt is designed for non-redistributable things. But I am not certain this is accurate.
How are those two files truly intended to be used?

requirements.txt:
This helps you to set up your development environment.
Programs like pip can be used to install all packages listed in the file in one fell swoop. After that you can start developing your python script. Especially useful if you plan to have others contribute to the development or use virtual environments.
This is how you use it:
pip install -r requirements.txt
It can be produced easily by pip itself:
pip freeze > requirements.txt
pip automatically tries to only add packages that are not installed by default, so the produced file is pretty minimal.
setup.py:
This helps you to create packages that you can redistribute.
The setup.py script is meant to install your package on the end user's system, not to prepare the development environment as pip install -r requirements.txt does. See this answer for more details on setup.py.
The dependencies of your project are listed in both files.

The short answer is that requirements.txt is for listing package requirements only. setup.py on the other hand is more like an installation script. If you don't plan on installing the python code, typically you would only need requirements.txt.
The file setup.py describes, in addition to the package dependencies, the set of files and modules that should be packaged (or compiled, in the case of native modules (i.e., written in C)), and metadata to add to the python package listings (e.g. package name, package version, package description, author, ...).
Because both files list dependencies, this can lead to a bit of duplication. Read below for details.
requirements.txt
This file lists python package requirements. It is a plain text file (optionally with comments) that lists the package dependencies of your python project (one per line). It does not describe the way in which your python package is installed. You would generally consume the requirements file with pip install -r requirements.txt.
The filename of the text file is arbitrary, but is often requirements.txt by convention. When exploring source code repositories of other python packages, you might stumble on other names, such as dev-dependencies.txt or dependencies-dev.txt. Those serve the same purpose as dependencies.txt but generally list additional dependencies of interest to developers of the particular package, namely for testing the source code (e.g. pytest, pylint, etc.) before release. Users of the package generally wouldn't need the entire set of developer dependencies to run the package.
If multiplerequirements-X.txt variants are present, then usually one will list runtime dependencies, and the other build-time, or test dependencies. Some projects also cascade their requirements file, i.e. when one requirements file includes another file (example). Doing so can reduce repetition.
setup.py
This is a python script which uses the setuptools module to define a python package (name, files included, package metadata, and installation). It will, like requirements.txt, also list runtime dependencies of the package. Setuptools is the de-facto way to build and install python packages, but it has its shortcomings, which over time have sprouted the development of new "meta-package managers", like pip. Example shortcomings of setuptools are its inability to install multiple versions of the same package, and lack of an uninstall command.
When a python user does pip install ./pkgdir_my_module (or pip install my-module), pip will run setup.py in the given directory (or module). Similarly, any module which has a setup.py can be pip-installed, e.g. by running pip install . from the same folder.
Do I really need both?
Short answer is no, but it's nice to have both. They achieve different purposes, but they can both be used to list your dependencies.
There is one trick you may consider to avoid duplicating your list of dependencies between requirements.txt and setup.py. If you have written a fully working setup.py for your package already, and your dependencies are mostly external, you could consider having a simple requirements.txt with only the following:
# requirements.txt
#
# installs dependencies from ./setup.py, and the package itself,
# in editable mode
-e .
# (the -e above is optional). you could also just install the package
# normally with just the line below (after uncommenting)
# .
The -e is a special pip install option which installs the given package in editable mode. When pip -r requirements.txt is run on this file, pip will install your dependencies via the list in ./setup.py. The editable option will place a symlink in your install directory (instead of an egg or archived copy). It allows developers to edit code in place from the repository without reinstalling.
You can also take advantage of what's called "setuptools extras" when you have both files in your package repository. You can define optional packages in setup.py under a custom category, and install those packages from just that category with pip:
# setup.py
from setuptools import setup
setup(
name="FOO"
...
extras_require = {
'dev': ['pylint'],
'build': ['requests']
}
...
)
and then, in the requirements file:
# install packages in the [build] category, from setup.py
# (path/to/mypkg is the directory where setup.py is)
-e path/to/mypkg[build]
This would keep all your dependency lists inside setup.py.
Note: You would normally execute pip and setup.py from a sandbox, such as those created with the program virtualenv. This will avoid installing python packages outside the context of your project's development environment.

For the sake of completeness, here is how I see it in 3 4 different angles.
Their design purposes are different
This is the precise description quoted from the official documentation (emphasis mine):
Whereas install_requires (in setup.py) defines the dependencies for a single project, Requirements Files are often used to define the requirements for a complete Python environment.
Whereas install_requires requirements are minimal, requirements files often contain an exhaustive listing of pinned versions for the purpose of achieving repeatable installations of a complete environment.
But it might still not easy to be understood, so in next section, there come 2 factual examples to demonstrate how the 2 approaches are supposed to be used, differently.
Their actual usages are therefore (supposed to be) different
If your project foo is going to be released as a standalone library (meaning, others would probably do import foo), then you (and your downstream users) would want to have a flexible declaration of dependency, so that your library would not (and it must not) be "picky" about what exact version of YOUR dependencies should be. So, typically, your setup.py would contain lines like this:
install_requires=[
'A>=1,<2',
'B>=2'
]
If you just want to somehow "document" or "pin" your EXACT current environment for your application bar, meaning, you or your users would like to use your application bar as-is, i.e. running python bar.py, you may want to freeze your environment so that it would always behave the same. In such case, your requirements file would look like this:
A==1.2.3
B==2.3.4
# It could even contain some dependencies NOT strickly required by your library
pylint==3.4.5
In reality, which one do I use?
If you are developing an application bar which will be used by python bar.py, even if that is "just script for fun", you are still recommended to use requirements.txt because, who knows, next week (which happens to be Christmas) you would receive a new computer as a gift, so you would need to setup your exact environment there again.
If you are developing a library foo which will be used by import foo, you have to prepare a setup.py. Period.
But you may still choose to also provide a requirements.txt at the same time, which can:
(a) either be in the A==1.2.3 style (as explained in #2 above);
(b) or just contain a magical single .
.
The latter is essentially using the conventional requirements.txt habit to document your installation step is pip install ., which means to "install the requirements based on setup.py" while without duplication. Personally I consider this last approach kind of blurs the line, adds to the confusion, but it is nonetheless a convenient way to explicitly opt out for dependency pinning when running in a CI environment. The trick was derived from an approach mentioned by Python packaging maintainer Donald in his blog post.
Different lower bounds.
Assuming there is an existing engine library with this history:
engine 1.1.0 Use steam
...
engine 1.2.0 Internal combustion is invented
engine 1.2.1 Fix engine leaking oil
engine 1.2.2 Fix engine overheat
engine 1.2.3 Fix occasional engine stalling
engine 2.0.0 Introducing nuclear reactor
You follow the above 3 criteria and correctly decided that your new library hybrid-engine would use a setup.py to declare its dependency engine>=1.2.0,<2, and then your separated application reliable-car would use requirements.txt to declare its dependency engine>=1.2.3,<2 (or you may want to just pin engine==1.2.3). As you see, your choice for their lower bound number are still subtly different, and neither of them uses the latest engine==2.0.0. And here is why.
hybrid-engine depends on engine>=1.2.0 because, the needed add_fuel() API was first introduced in engine 1.2.0, and that capability is the necessity of hybrid-engine, regardless of whether there might be some (minor) bugs inside such version and been fixed in subsequent versions 1.2.1, 1.2.2 and 1.2.3.
reliable-car depends on engine>=1.2.3 because that is the earliest version WITHOUT known issues, so far. Sure there are new capabilities in later versions, i.e. "nuclear reactor" introduced in engine 2.0.0, but they are not necessarily desirable for project reliable-car. (Your yet another new project time-machine would likely use engine>=2.0.0, but that is a different topic, though.)

TL;DR
requirements.txt lists concrete dependencies
setup.py lists abstract dependencies
A common misunderstanding with respect to dependency management in Python is whether you need to use a requirements.txt or setup.py file in order to handle dependencies.
The chances are you may have to use both in order to ensure that dependencies are handled appropriately in your Python project.
The requirements.txt file is supposed to list the concrete dependencies. In other words, it should list pinned dependencies (using the == specifier). This file will then be used in order to create a working virtual environment that will have all the dependencies installed, with the specified versions.
On the other hand, the setup.py file should list the abstract dependencies. This means that it should list the minimal dependencies for running the project. Apart from dependency management though, this file also serves the package distribution (say on PyPI).
For a more comprehensive read, you can read the article requirements.txt vs setup.py in Python on TDS.
Now going forward and as of PEP-517 and PEP-518, you may have to use a pyproject.toml in order to specify that you want to use setuptools as the build-tool and an additional setup.cfg file to specify the details.
For more details you can read the article setup.py vs setup.cfg in Python.

packaging scientific project in python

I am trying to build a package for an apps in python. It uses sklearn, pandas, numpy, boto and some other scientific module from anaconda. Being very unexperienced with python packaging, I have various questions:
1- I have some confidential files .py in my project which I don't want anyone to be able to see. In java I would have defined private files and classes but I am completely lost in python. What is the "good practice" to deal with these private modules? Can anyone link me some tutorial?
2- What is the best way to package my apps? I don't want to publish anything on Pypi, I only need it to execute on Google App engine for instance. I tried a standalone package with PyInstaller but I could not finish it because of numpy and other scipy packages which makes it hard. Is there a simple way to package in a private way python projects made with anaconda?
3- Since I want to build more apps in a close future, shall I try to make sub-packages in order to use them for other apps?

The convention is to lead with a single underscore _ if something is internal. Note that this is a convention. If someone really wants to use it, they still can. Your code is not strictly confidential.
Take a look at http://python-packaging-user-guide.readthedocs.org/en/latest/. You don't need to publish to pypi to create a Python package that uses tools such as pip. You can create a project with a setup.py file and a requirements.txt file and then use pip to install your package from wherever you have it (e.g., a local directory or a repository on github). If you take this approach then pip will install all the dependencies you list.
If you want to reuse your package, just include it in requirements.txt and the install_requires parameter in setup.py (see http://python-packaging-user-guide.readthedocs.org/en/latest/requirements/). For example, if you install your package with pip install https://github/myname/mypackage.git then you could include https://github/myname/mypackage.git in your requirements.txt file in future projects.

Can we shed some definitive light on how python packaging and import works?

I had my fair chance of getting through the python management of modules, and every time is a challenge: packaging is not what people do every day, and it becomes a burden to learn, and a burden to remember, even when you actually do it, since this happens normally once.
I would like to collect here the definitive overview of how import, package management and distribution works in python, so that this question becomes the definitive explanation for all the magic that happens under the hood. Although I understand the broad level of the question, these things are so intertwined that any focused answer will not solve the main problem: understand how all works, what is outdated, what is current, what are just alternatives for the same task, what are the quirks.
The list of keywords to refer to is the following, but this is just a sample out of the bunch. There's a lot more and you are welcome to add additional details.
PyPI
setuptools / Distribute
distutils
eggs
egg-link
pip
zipimport
site.py
site-packages
.pth files
virtualenv
handling of compiled modules in eggs (with and without installation via easy_install)
use of get_data()
pypm
bento
PEP 376
the cheese shop
eggsecutable
Linking to other answers is probably a good idea. As I said, this question is for the high-level overview.

For the most part, this is an attempt to look at the packaging/distribution side, not the mechanics of import. Unfortunately, packaging is the place where Python provides way more than one way to do it. I'm just trying to get the ball rolling, hopefully others will help fill what I miss or point out mistakes.
First of all there's some messy terminology here. A directory containing an __init__.py file is a package. However, most of what we're talking about here are specific versions of packages published on PyPI, one of it's mirrors, or in a vendor specific package management system like Debian's Apt, Redhat's Yum, Fink, Macports, Homebrew, or ActiveState's pypm.
These published packages are what folks are trying to call "Distributions" going forward in an attempt to use "Package" only as the Python language construct. You can see some of that usage in PEP-376 PEP-376.
Now, your list of keywords relate to several different aspects of the Python Ecosystem:
Finding and publishing python distributions:
PyPI (aka the cheese shop)
PyPI Mirrors
Various package management tools / systems: apt, yum, fink, macports, homebrew
pypm (ActiveState's alternative to PyPI)
The above are all services that provide a place to publish Python distributions in various formats. Some, like PyPI mirrors and apt / yum repositories can be run on your local machine or within your companies network but folks typically use the official ones. Most, if not all provide a tool (or multiple tools in the case of PyPI) to help find and download distributions.
Libraries used to create and install distributions:
setuptools / Distribute
distutils
Distutils is the standard infrastructure on which Python packages are compiled and built into distributions. There's a ton of functionality in distutils but most folks just know:
from distutils.core import setup
setup(name='Distutils',
version='1.0',
description='Python Distribution Utilities',
author='Greg Ward',
author_email='gward#python.net',
url='http://www.python.org/sigs/distutils-sig/',
packages=['distutils', 'distutils.command'],
)
And to some extent that's a most of what you need. With the prior 9 lines of code you have enough information to install a pure Python package and also the minimal metadata required to publish that package a distribution on PyPI.
Setuptools provides the hooks necessary to support the Egg format and all of it's features and foibles. Distribute is an alternative to Setuptools that adds some features while trying to be mostly backwards compatible. I believe Distribute is going to be included in Python 3 as the successor to Distutil's from distutils.core import setup.
Both Setuptools and Distribute provide a custom version of the distutils setup command
that does useful things like support the Egg format.
Python Distribution Formats:
source
eggs
Distributions are typically provided either as source archives (tarball or zipfile). The standard way to install a source distribution is by downloading and uncompressing the archive and then running the setup.py file inside.
For example, the following will download, build, and install the Pygments syntax highlighting library:
curl -O -G http://pypi.python.org/packages/source/P/Pygments/Pygments-1.4.tar.gz
tar -zxvf Pygments-1.4.tar.gz
cd Pygments-1.4
python setup.py build
sudo python setup.py install
Alternatively you can download the Egg file and install it. Typically this is accomplished by using easy_install or pip:
sudo easy_install pygments
or
sudo pip install pygments
Eggs were inspired by Java's Jarfiles and they have quite a few features you should read about here
Python Package Formats:
uncompressed directories
zipimport (zip compressed directories)
A normal python package is just a directory containing an __init__.py file and an arbitrary number of additional modules or sub-packages. Python also has support for finding and loading source code within *.zip files as long as they are included on the PYTHONPATH (sys.path).
Installing Python Packages:
easy_install: the original egg installation tool, depends on setuptools
pip: currently the most popular way to install python packages. Similar to easy_install but more flexible and has some nice features like requirements files to help document dependencies and reproduce deployments.
pypm, apt, yum, fink, etc
Environment Management / Automated Deployment:
bento
buildout
virtualenv (and virtualenvwrapper)
The above tools are used to help automate and manage dependencies for a Python project. Basically they give you tools to describe what distributions your application requires and automate the installation of those specific versions of your dependencies.
Locations of Packages / Distributions:
site-packages
PYTHONPATH
the current working directory (depends on your OS and environment settings)
By default, installing a python distribution is going to drop it into the site-packages directory. That directory is usually something like /usr/lib/pythonX.Y/site-packages.
A simple programmatic way to find your site-packages directory:
from distuils import sysconfig
print sysconfig.get_python_lib()
Ways to modify your PYTHONPATH:
Python's import statement will only find packages that are located in one of the directories included in your PYTHONPATH.
You can inspect and change your path from within Python by accessing:
import sys
print sys.path
sys.path.append("/home/myname/lib")
Besides that, you can set the PYTHONPATH environment variable like you would any other environment variable on your OS or you could use:
.pth files: *.pth files located in directories that are already on your PYTHONPATH are read and each line of the *.pth file is added to your PYTHONPATH. Basically any time you would copy a package into a directory on your PYTHONPATH you could instead create a mypackages.pth. Read more about *.pth files: site module
egg-link files: Internal structure of python eggs they are a cross platform alternative to symbolic links. Creating an egg link file is similar to creating a pth file.
site.py modifications
To add the above /home/myname/lib to site-packages with a *.pth file you'd create a *.pth file. The name of the file doesn't matter but you should still probably choose something sensible.
Let's create myname.pth:
# myname.pth
/home/myname/lib
That's it. Drop that into sysconfig.get_python_lib() on your system or any other directory in your PYTHONPATH and /home/myname/lib will be added to the path.

For packaging question, this should help http://guide.python-distribute.org/
For import, the old article from Fredrik Lundh http://effbot.org/zone/import-confusion.htm still a very good starting point.

I recommend Tarek Ziadek's Book on Python. There's a chapter dedicated to packaging and distribution.

I don't think import needs to be explored (Python's namespacing and importing functionality is intuitive IMHO).
I use pip exclusively now. I haven't run into any issues with it.
However, the topic of packaging and distribution is something worth exploring. Instead of giving a lengthy answer, I will say this:
I learned how to package and distribute my own "packages" by simply copying how Pylons or many other open-source packages do it. I then combined that sort-of template with reading up of the docs to flesh it out even further and have come up with a solid distribution method.
When you grok package management and distribution for python (distutils and pypi) it's actually quite powerful. I like it a lot.
[edit]
I also wanted to add in a bit about virtualenv. USE IT. I create a virtualenv for every project and I always use --no-site-packages; I install all the packages I need for that particular project (even if it's something common amongst them all, like lxml) inside the virtualev. It keeps everything isolated and it's much easier for me to maintain the grouping in my head (rather than trying to keep track of what's where and for which version of python!)
[/edit]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.