Is it Possible to Install Part of Python Package Via Pip? - python

I have an internal utility library that is used by many projects. There is quite a bit of overlap between the projects in the code they pull from the utility library, but as the library grows, so does the amount of extra stuff any individual project gets that it won't be using. This wouldn't be an issue if the library consisted of only python, but the library also bundles in binaries.
Example-
psycopg2 is used in a handful of places within the utility library, but not all projects need db access. Because the development environment isn't the same as the production environment, the utility library also includes psycopg2 binaries for the prod environment.
This grows with openssl libraries, pandas, numpy, scipy, pyarrow, etc. The result is that a small, 50 line, single purpose script that may need db access is bundled into a 100mb+ deployment package.
So what I'd like to do is carve up the utility library into pieces where the projects down stream can choose which pieces to pull in, but keep the utility library code in one easy-to-manage place. That way, this small single purpose application can choose to import internal-util#core, internal-util#db and not include internal-util#numpy and internal-util#openssl
Is what i'm describing possible to do?

Not directly, to my best knowledge. pip installs a package fully, or not at all.
However, if you're careful in your package about how you import things that may require psycopg2 or someotherlargebinarything, you could use the extras_require feature and thus have the package's users choose which dependencies they want to pull in:
setup(
# ...
name='myawesometoolbelt',
extras_require={
'db': ['psycopg2'],
'math': ['numpy'],
},
)
and then, in your requirements.txt, or pip invocation,
myawesometoolbelt[db,math]

Have you tried looking at pip freeze > requirements.txt and pip install -r requirements.txt?
Once you generated your pip list via pip freeze, it is possible to edit which packages you want installed and which to omit from the requirements.txt generated.
You can then pip install -r requirements.txt the things you want back in.

Related

Updating dependencies of downstream projects using pip requirements.txt

I have a python package that is used by other applications across an organization, let's call it buildtools.
Other applications in my organization have installed this package via
pip install git+https://${OAUTH_TOKEN}:x-oauth-basic#github.com/my_organization/buildtools#egg=buildtools
I want to add a new feature to buildtools that requires a 3rd party package, let's just say its requests. So within buildtools I add requests to requirements.txt, import it, and it's all good.
But none of the other applications in my organization have requests as one of their dependencies in requirements.txt.
When I merge my new code in and update the package, I believe we will run into some ImportError: No module named requests errors in the downstream applications that use buildtools.
How can I ensure that any application that uses the buildtools package gets the requests package installed when they get the latest buildtools?
In other words, how can I update buildtools's dependencies recursively?
I am aware that I could add requests to requirements.txt across all the applications in my organization that uses buildtools, but I'm trying to avoid that.
Why don't you just run
pip install -r requirements.txt
as discussed, e.g. here?
That's the best and most painless way to update/install required packages recursively.
After further research I found that install_requires within setup.py is exactly what I was looking for. This example explains it well.

Why install Python packages

Why we have to install the python packages before using them?
I am currently working on a small python mysql program. What i tried to download the python connector module from mysql webpage and simply unzip it and place it in the same folder of my code.
And I can import the module properly.
So what is the meaning of installing those packages? Can I use those packages like matplotlib, numpy without installing them ?
Is it possible to have all the required packages installed on a folder so that i can move it to another computer and run my program with only CPython installed (I don't want to install any package on this computer)?
it's not that simple :-)
some packages have dependencies, you also need to download and extract their dependencies (you need pacakge x,and package x uses y) pakcage manager handles that
some package have some c code (they need to be compiled before use (ujson or postgres module) package manager handles that
when your share your code instead of sharing dependencies you simply add a file containing the list of dependencies (requirements.txt) and other user can simply install all dependencies using package manager
Installing a python package enables us to use it anywhere on our system. If we just place the package in the same directory as our script then it may well work, but only for scripts in that directory.
Some packages also rely on others to function properly, and the installation of a package may well install those pre-requisite packages for you. You may be able to do this manually, but you'd have to put them all in the same directory as your script every time you wanted to run it.
So installing the packages is the easiest way to use them.
You don't have to install them, and in some cases you wouldn't install them on your system; if you had split your code across two files and imported one file at the top of the other for example.
In fact, you don't really need install package on your system.
But if you install it, you can use these packages every where on your system.
Also, you can create a requirement.txt file to enable install all packages that you need on other computer. You can check this manual https://pip.pypa.io/en/stable/user_guide/#requirements-files

requirements.txt vs setup.py

I started working with Python. I've added requirements.txt and setup.py to my project. But, I am still confused about the purpose of both files. I have read that setup.py is designed for redistributable things and that requirements.txt is designed for non-redistributable things. But I am not certain this is accurate.
How are those two files truly intended to be used?
requirements.txt:
This helps you to set up your development environment.
Programs like pip can be used to install all packages listed in the file in one fell swoop. After that you can start developing your python script. Especially useful if you plan to have others contribute to the development or use virtual environments.
This is how you use it:
pip install -r requirements.txt
It can be produced easily by pip itself:
pip freeze > requirements.txt
pip automatically tries to only add packages that are not installed by default, so the produced file is pretty minimal.
setup.py:
This helps you to create packages that you can redistribute.
The setup.py script is meant to install your package on the end user's system, not to prepare the development environment as pip install -r requirements.txt does. See this answer for more details on setup.py.
The dependencies of your project are listed in both files.
The short answer is that requirements.txt is for listing package requirements only. setup.py on the other hand is more like an installation script. If you don't plan on installing the python code, typically you would only need requirements.txt.
The file setup.py describes, in addition to the package dependencies, the set of files and modules that should be packaged (or compiled, in the case of native modules (i.e., written in C)), and metadata to add to the python package listings (e.g. package name, package version, package description, author, ...).
Because both files list dependencies, this can lead to a bit of duplication. Read below for details.
requirements.txt
This file lists python package requirements. It is a plain text file (optionally with comments) that lists the package dependencies of your python project (one per line). It does not describe the way in which your python package is installed. You would generally consume the requirements file with pip install -r requirements.txt.
The filename of the text file is arbitrary, but is often requirements.txt by convention. When exploring source code repositories of other python packages, you might stumble on other names, such as dev-dependencies.txt or dependencies-dev.txt. Those serve the same purpose as dependencies.txt but generally list additional dependencies of interest to developers of the particular package, namely for testing the source code (e.g. pytest, pylint, etc.) before release. Users of the package generally wouldn't need the entire set of developer dependencies to run the package.
If multiplerequirements-X.txt variants are present, then usually one will list runtime dependencies, and the other build-time, or test dependencies. Some projects also cascade their requirements file, i.e. when one requirements file includes another file (example). Doing so can reduce repetition.
setup.py
This is a python script which uses the setuptools module to define a python package (name, files included, package metadata, and installation). It will, like requirements.txt, also list runtime dependencies of the package. Setuptools is the de-facto way to build and install python packages, but it has its shortcomings, which over time have sprouted the development of new "meta-package managers", like pip. Example shortcomings of setuptools are its inability to install multiple versions of the same package, and lack of an uninstall command.
When a python user does pip install ./pkgdir_my_module (or pip install my-module), pip will run setup.py in the given directory (or module). Similarly, any module which has a setup.py can be pip-installed, e.g. by running pip install . from the same folder.
Do I really need both?
Short answer is no, but it's nice to have both. They achieve different purposes, but they can both be used to list your dependencies.
There is one trick you may consider to avoid duplicating your list of dependencies between requirements.txt and setup.py. If you have written a fully working setup.py for your package already, and your dependencies are mostly external, you could consider having a simple requirements.txt with only the following:
# requirements.txt
#
# installs dependencies from ./setup.py, and the package itself,
# in editable mode
-e .
# (the -e above is optional). you could also just install the package
# normally with just the line below (after uncommenting)
# .
The -e is a special pip install option which installs the given package in editable mode. When pip -r requirements.txt is run on this file, pip will install your dependencies via the list in ./setup.py. The editable option will place a symlink in your install directory (instead of an egg or archived copy). It allows developers to edit code in place from the repository without reinstalling.
You can also take advantage of what's called "setuptools extras" when you have both files in your package repository. You can define optional packages in setup.py under a custom category, and install those packages from just that category with pip:
# setup.py
from setuptools import setup
setup(
name="FOO"
...
extras_require = {
'dev': ['pylint'],
'build': ['requests']
}
...
)
and then, in the requirements file:
# install packages in the [build] category, from setup.py
# (path/to/mypkg is the directory where setup.py is)
-e path/to/mypkg[build]
This would keep all your dependency lists inside setup.py.
Note: You would normally execute pip and setup.py from a sandbox, such as those created with the program virtualenv. This will avoid installing python packages outside the context of your project's development environment.
For the sake of completeness, here is how I see it in 3 4 different angles.
Their design purposes are different
This is the precise description quoted from the official documentation (emphasis mine):
Whereas install_requires (in setup.py) defines the dependencies for a single project, Requirements Files are often used to define the requirements for a complete Python environment.
Whereas install_requires requirements are minimal, requirements files often contain an exhaustive listing of pinned versions for the purpose of achieving repeatable installations of a complete environment.
But it might still not easy to be understood, so in next section, there come 2 factual examples to demonstrate how the 2 approaches are supposed to be used, differently.
Their actual usages are therefore (supposed to be) different
If your project foo is going to be released as a standalone library (meaning, others would probably do import foo), then you (and your downstream users) would want to have a flexible declaration of dependency, so that your library would not (and it must not) be "picky" about what exact version of YOUR dependencies should be. So, typically, your setup.py would contain lines like this:
install_requires=[
'A>=1,<2',
'B>=2'
]
If you just want to somehow "document" or "pin" your EXACT current environment for your application bar, meaning, you or your users would like to use your application bar as-is, i.e. running python bar.py, you may want to freeze your environment so that it would always behave the same. In such case, your requirements file would look like this:
A==1.2.3
B==2.3.4
# It could even contain some dependencies NOT strickly required by your library
pylint==3.4.5
In reality, which one do I use?
If you are developing an application bar which will be used by python bar.py, even if that is "just script for fun", you are still recommended to use requirements.txt because, who knows, next week (which happens to be Christmas) you would receive a new computer as a gift, so you would need to setup your exact environment there again.
If you are developing a library foo which will be used by import foo, you have to prepare a setup.py. Period.
But you may still choose to also provide a requirements.txt at the same time, which can:
(a) either be in the A==1.2.3 style (as explained in #2 above);
(b) or just contain a magical single .
.
The latter is essentially using the conventional requirements.txt habit to document your installation step is pip install ., which means to "install the requirements based on setup.py" while without duplication. Personally I consider this last approach kind of blurs the line, adds to the confusion, but it is nonetheless a convenient way to explicitly opt out for dependency pinning when running in a CI environment. The trick was derived from an approach mentioned by Python packaging maintainer Donald in his blog post.
Different lower bounds.
Assuming there is an existing engine library with this history:
engine 1.1.0 Use steam
...
engine 1.2.0 Internal combustion is invented
engine 1.2.1 Fix engine leaking oil
engine 1.2.2 Fix engine overheat
engine 1.2.3 Fix occasional engine stalling
engine 2.0.0 Introducing nuclear reactor
You follow the above 3 criteria and correctly decided that your new library hybrid-engine would use a setup.py to declare its dependency engine>=1.2.0,<2, and then your separated application reliable-car would use requirements.txt to declare its dependency engine>=1.2.3,<2 (or you may want to just pin engine==1.2.3). As you see, your choice for their lower bound number are still subtly different, and neither of them uses the latest engine==2.0.0. And here is why.
hybrid-engine depends on engine>=1.2.0 because, the needed add_fuel() API was first introduced in engine 1.2.0, and that capability is the necessity of hybrid-engine, regardless of whether there might be some (minor) bugs inside such version and been fixed in subsequent versions 1.2.1, 1.2.2 and 1.2.3.
reliable-car depends on engine>=1.2.3 because that is the earliest version WITHOUT known issues, so far. Sure there are new capabilities in later versions, i.e. "nuclear reactor" introduced in engine 2.0.0, but they are not necessarily desirable for project reliable-car. (Your yet another new project time-machine would likely use engine>=2.0.0, but that is a different topic, though.)
TL;DR
requirements.txt lists concrete dependencies
setup.py lists abstract dependencies
A common misunderstanding with respect to dependency management in Python is whether you need to use a requirements.txt or setup.py file in order to handle dependencies.
The chances are you may have to use both in order to ensure that dependencies are handled appropriately in your Python project.
The requirements.txt file is supposed to list the concrete dependencies. In other words, it should list pinned dependencies (using the == specifier). This file will then be used in order to create a working virtual environment that will have all the dependencies installed, with the specified versions.
On the other hand, the setup.py file should list the abstract dependencies. This means that it should list the minimal dependencies for running the project. Apart from dependency management though, this file also serves the package distribution (say on PyPI).
For a more comprehensive read, you can read the article requirements.txt vs setup.py in Python on TDS.
Now going forward and as of PEP-517 and PEP-518, you may have to use a pyproject.toml in order to specify that you want to use setuptools as the build-tool and an additional setup.cfg file to specify the details.
For more details you can read the article setup.py vs setup.cfg in Python.

setup.py + virtualenv = chicken and egg issue?

I'm a Java/Scala dev transitioning to Python for a work project. To dust off the cobwebs on the Python side of my brain, I wrote a webapp that acts as a front-end for Docker when doing local Docker work. I'm now working on packaging it up and, as such, am learning about setup.py and virtualenv. Coming from the JVM world, where dependencies aren't "installed" so much as downloaded to a repository and referenced when needed, the way pip handles things is a bit foreign. It seems like best practice for production Python work is to first create a virtual environment for your project, do your coding work, then package it up with setup.py.
My question is, what happens on the other end when someone needs to install what I've written? They too will have to create a virtual environment for the package but won't know how to set it up without inspecting the setup.py file to figure out what version of Python to use, etc. Is there a way for me to create a setup.py file that also creates the appropriate virtual environment as part of the install process? If not — or if that's considered a "no" as this respondent stated to this SO post — what is considered "best practice" in this situation?
You can think of virtualenv as an isolation for every package you install using pip. It is a simple way to handle different versions of python and packages. For instance you have two projects which use same packages but different versions of them. So, by using virtualenv you can isolate those two projects and install different version of packages separately, not on your working system.
Now, let's say, you want work on a project with your friend. In order to have the same packages installed you have to share somehow what versions and which packages your project depends on. If you are delivering a reusable package (a library) then you need to distribute it and here where setup.py helps. You can learn more in Quick Start
However, if you work on a web site, all you need is to put libraries versions into a separate file. Best practice is to create separate requirements for tests, development and production. In order to see the format of the file - write pip freeze. You will be presented with a list of packages installed on the system (or in the virtualenv) right now. Put it into the file and you can install it later on another pc, with completely clear virtualenv using pip install -r development.txt
And one more thing, please do not put strict versions of packages like pip freeze shows, most of time you want >= at least X.X version. And good news here is that pip handles dependencies by its own. It means you do not have to put dependent packages there, pip will sort it out.
Talking about deploy, you may want to check tox, a tool for managing virtualenvs. It helps a lot with deploy.
Python default package path always point to system environment, that need Administrator access to install. Virtualenv able to localised the installation to an isolated environment.
For deployment/distribution of package, you can choose to
Distribute by source code. User need to run python setup.py --install, or
Pack your python package and upload to Pypi or custom Devpi. So the user can simply use pip install <yourpackage>
However, as you notice the issue on top : without virtualenv, they user need administrator access to install any python package.
In addition, the Pypi package worlds contains a certain amount of badly tested package that doesn't work out of the box.
Note : virtualenv itself is actually a hack to achieve isolation.

Distribute python helper application inside other product for developers and users

Primary product has several python scripts which help users and developers to work with primary application.
I want found way to distribute this scripts with dependencies in following way:
developers of product should just get a copy of repository, build it, and receive the workable solution. No root access, no downloads acceptable.
users of product should just get a copy of tar.gz, unpack it, and use. No root access, not downloads, no additional build steps acceptable.
users of product should just get a rpm/deb packages, install it, and use it. No additional downloads acceptable.
All python distribution way requires the run of additional scripts (it is denied for tar.gz), and then download dependencies often.
So, for my task I need python-fabric with all dependencies.
Right now script wrote with standard python libraries and very ugly, I want switch it to fabric.
How to right support all these requirements?
pip can install from local tarballs. You could enumerate them in a requirements file; then to install, run from within a virtualenv:
pip install -r requirements.txt
If there are C extensions among dependencies then your developers should have a compiler, python headers installed.
PyInstaller could help you bundle a binary distribution for users. If there are no C extensions; it is easy to create the archive with all dependencies ready to use by hand.
pkgme might help you to automate packaging for Ubuntu. You could also use stdeb to manually generate .deb files.
A meaningful setup.py describing dependencies, scripts entry points, etc is a prerequisite for all the above.

Categories