I would like to analyze the dependency tree of Python packages. How can I obtain this data?
Things I already know
setup.py sometimes contains a requires field that lists package dependencies
PyPi is an online repository of Python packages
PyPi has an API
Things that I don't know
Very few projects (around 10%) on PyPi explicitly list dependencies in the requires field but pip/easy_install still manage to download the correct packages. What am I missing? For example the popular library for statistical computing, pandas, doesn't list requires but still manages to install numpy, pytz, etc.... Is there a better way to automatically collect the full list of dependencies?
Is there a pre-existing database somewhere? Am I repeating existing work?
Do similar, easily accessible, databases exist for other languages with distribution systems (R, Clojure, etc...?)
You should be looking at the install_requires field instead, see New and changed setup keywords.
requires is deemed too vague a field to rely on for dependency installation. In addition, there are setup_requires and test_requires fields for dependencies required for setup.py and for running tests.
Certainly, the dependency graph has been analyzed before; from this blog article by Olivier Girardot comes this fantastic image:
The image is linked to the interactive version of the graph.
Using tool like pip, you can list all requirements for each package.
The command is:
pip install --no-install package_name
You can reuse part of pip in your script. The part responsible for parsing requirements is module pip.req.
Here is how you can do it programmatically using python pip package:
from pip._vendor import pkg_resources # Ensure pip conf index-url pointed to real PyPi Index
# Get dependencies from pip
package_name = 'Django'
try:
package_resources = pkg_resources.working_set.by_key[package_name.lower()] # Throws KeyError if not found
dependencies = package_resources._dep_map.keys() + ([str(r) for r in package_resources.requires()])
dependencies = list(set(dependencies))
except KeyError:
dependencies = []
And here is how you can get dependencies from the PyPi API:
import requests
import json
package_name = 'Django'
# Package info url
PYPI_API_URL = 'https://pypi.python.org/pypi/{package_name}/json'
package_details_url = PYPI_API_URL.format(package_name=package_name)
response = requests.get(package_details_url)
data = json.loads(response.content)
if response.status_code == 200:
dependencies = data['info'].get('requires_dist')
dependencies2 = data['info'].get('requires')
dependencies3 = data['info'].get('setup_requires')
dependencies4 = data['info'].get('test_requires')
dependencies5 = data['info'].get('install_requires')
if dependencies2:
dependencies.extend(dependencies2)
if dependencies3:
dependencies.extend(dependencies3)
if dependencies4:
dependencies.extend(dependencies4)
if dependencies5:
dependencies.extend(dependencies5)
dependencies = list(set(dependencies))
You can use recursion to call dependencies of dependencies to get the full tree. Cheers!
Related
I want to write a setup.py file using setuptools. My package depends on tensorflow, but there are two different pip packages that fulfill the requirement, tensorflow and tensorflow-gpu. If I just put tensorflow in my setup(..., install_requires=["tensorflow"]), then installation will fail if the the user has instead installed the tensorflow-gpu pip package on their system.
The imp module can't be used to check (as in this answer: How to check if a python module exists without importing it), because the import name of the module is tensorflow regardless of which pip package the user installed. So how does setuptools (and therefore distutils) detect which pip package was installed? I've dug through the source a bit, but can't find the place that it checks.
*Note, I am not planning on hacking setuptools to accept either. I just want to know what method it is using to detect the package, so I can use that same method in my setup.py to manually set the install_requires param to the correct version. (i.e. like this: Alternative dependencies (fall back) in setup.py)
I answered a similar question recently. You need to distinguish one TF from the other. I don't know TF enough to help with the detail but most part of the code should be like this:
kw = {}
try:
import tensorflow
except ImportError: # There is no one, declare dependency
kw['install_requires'] = ['tensorflow']
else:
if is_gpu(tensorflow):
kw['install_requires'] = ['tensorflow-gpu']
else:
kw['install_requires'] = ['tensorflow']
setup(
…
**kw
)
Given requirements.txt and a virtualenv environment, what is the best way to check from a script whether requirements are met and possibly provide details in case of mismatch?
Pip changes it's internal API with major releases, so I seen advices not to use it's parse_requirements method.
There is a way of pkg_resources.require(dependencies), but then how to parse requirements file with all it's fanciness, like github links, etc.?
This should be something pretty simple, but can't find any pointers.
UPDATE: programmatic solution is needed.
You can save your virtualenv's current installed packages with pip freeze to a file, say current.txt
pip freeze > current.txt
Then you can compare this to requirements.txt with difflib using a script like this:
import difflib
req = open('requirements.txt')
current = open('current.txt')
diff = difflib.ndiff(req.readlines(), current.readlines())
delta = ''.join([x for x in diff if x.startswith('-')])
print(delta)
This should display only the packages that are in 'requirements.txt' that aren't in 'current.txt'.
Got tired of the discrepancies between requirements.txt and the actually installed packages (e.g. when deploying to Heroku, I'd often get ModuleNotFoundError for forgetting to add a module to requirements.)
This helps:
Use compare-requirements (GitHub)
(you'll need to pip install pipdeptree to use it.)
It's then as simple as...
cmpreqs --pipdeptree
...to show you (in "Input 2") which modules are installed, but missing from requirements.txt.
You can then examine the list and see which ones should in fact be added to requirements.txt.
I'm trying to install a python package from the private reportlab pypi server using zc.buildout.
When I install using the instructions provided on their own site, then it installs without problem. http://www.reportlab.com/reportlabplus/installation/
If however I install using zc.buildout, I keep getting Couldn't find distributions for 'rlextra'. I added their pypi repo to find-links, so I'm not sure what I'm missing.
My buildout config:
[buildout]
versions = versions
include-site-packages = false
extensions = mr.developer
unzip = true
find-links = https://[user]:[pass]#www.reportlab.com/pypi
parts =
python
django
compass-config
auto-checkout = *
eggs =
...
rlextra
...
... etc.
Edit: I should point out that I did in the end do a manual download of the package, and using it in my buildout as a develop package. Even though this solves the immediate issue, I would still like to know why my original setup is not working.
You are passing in the PyPI main link for the find-links URL, but find-links only works with the simple index style pages (which exist per package on PyPI).
For example, the beautifulsoup4 package has a simple index page at https://pypi.python.org/simple/beautifulsoup4/.
The ReportLab server also has simple pages; add the one for this package to your buildout:
find-links = https://[user]:[pass]#www.reportlab.com/pypi/simple/rlextra/
IIRC you can also add the top-level https://[user]:[pass]#www.reportlab.com/pypi/simple URL as a find-links, but being more specific saves on the URL round-trips.
The Twisted Plugin System is the preferred way to write extensible twisted applications.
However, due to the way the plugin system is structured (plugins go into a twisted/plugins directory which should not be a Python package), writing a proper setup.py for installing those plugins appears to be non-trivial.
I've seen some attempts that add 'twisted.plugins' to the 'packages' key of the distutils setup command, but since it is not really a package, bad things happen (for example, an __init__.py is helpfully added by some tools).
Other attempts seem to use 'package_data' instead (eg, http://bazaar.launchpad.net/~glyph/divmod.org/trunk/view/head:/Epsilon/epsilon/setuphelper.py), but that can also fail in weird ways.
The question is: has anyone successfully written a setup.py for installing twisted plugins which works in all cases?
I document a setup.py below that is needed only if you have users with pip < 1.2 (e.g. on Ubuntu 12.04). If everyone has pip 1.2 or newer, the only thing you need is packages=[..., 'twisted.plugins'].
By preventing pip from writing the line "twisted" to .egg-info/top_level.txt, you can keep using packages=[..., 'twisted.plugins'] and have a working pip uninstall that doesn't remove all of twisted/. This involves monkeypatching setuptools/distribute near the top of your setup.py. Here is a sample setup.py:
from distutils.core import setup
# When pip installs anything from packages, py_modules, or ext_modules that
# includes a twistd plugin (which are installed to twisted/plugins/),
# setuptools/distribute writes a Package.egg-info/top_level.txt that includes
# "twisted". If you later uninstall Package with `pip uninstall Package`,
# pip <1.2 removes all of twisted/ instead of just Package's twistd plugins.
# See https://github.com/pypa/pip/issues/355 (now fixed)
#
# To work around this problem, we monkeypatch
# setuptools.command.egg_info.write_toplevel_names to not write the line
# "twisted". This fixes the behavior of `pip uninstall Package`. Note that
# even with this workaround, `pip uninstall Package` still correctly uninstalls
# Package's twistd plugins from twisted/plugins/, since pip also uses
# Package.egg-info/installed-files.txt to determine what to uninstall,
# and the paths to the plugin files are indeed listed in installed-files.txt.
try:
from setuptools.command import egg_info
egg_info.write_toplevel_names
except (ImportError, AttributeError):
pass
else:
def _top_level_package(name):
return name.split('.', 1)[0]
def _hacked_write_toplevel_names(cmd, basename, filename):
pkgs = dict.fromkeys(
[_top_level_package(k)
for k in cmd.distribution.iter_distribution_names()
if _top_level_package(k) != "twisted"
]
)
cmd.write_file("top-level names", filename, '\n'.join(pkgs) + '\n')
egg_info.write_toplevel_names = _hacked_write_toplevel_names
setup(
name='MyPackage',
version='1.0',
description="You can do anything with MyPackage, anything at all.",
url="http://example.com/",
author="John Doe",
author_email="jdoe#example.com",
packages=['mypackage', 'twisted.plugins'],
# You may want more options here, including install_requires=,
# package_data=, and classifiers=
)
# Make Twisted regenerate the dropin.cache, if possible. This is necessary
# because in a site-wide install, dropin.cache cannot be rewritten by
# normal users.
try:
from twisted.plugin import IPlugin, getPlugins
except ImportError:
pass
else:
list(getPlugins(IPlugin))
I've tested this with pip install, pip install --user, and easy_install. With any install method, the above monkeypatch and pip uninstall work fine.
You might be wondering: do I need to clear the monkeypatch to avoid messing up the next install? (e.g. pip install --no-deps MyPackage Twisted; you wouldn't want to affect Twisted's top_level.txt.) The answer is no; the monkeypatch does not affect another install because pip spawns a new python for each install.
Related: keep in mind that in your project, you must not have a file twisted/plugins/__init__.py. If you see this warning during installation:
package init file 'twisted/plugins/__init__.py' not found (or not a regular file)
it is completely normal and you should not try to fix it by adding an __init__.py.
Here is a blog entry which describes doing it with 'package_data':
http://chrismiles.livejournal.com/23399.html
In what weird ways can that fail? It could fail if the installation of the package doesn't put the package data into a directory which is on the sys.path. In that case the Twisted plugin loader wouldn't find it. However, all installations of Python packages that I know of will put it into the same directory where they are installing the Python modules or packages themselves, so that won't be a problem.
Maybe you could adapt the package_data idea to use data_files instead: it wouldn’t require you to list twisted.plugins as package, as it uses absolute paths. It would still be a kludge, though.
My tests with pure distutils have told me that its is possible to overwrite files from another distribution. I wanted to test poor man’s namespace packages using pkgutil.extend_path and distutils, and it turns out that I can install spam/ham/__init__.py with spam.ham/setup.py and spam/eggs/__init__.py with spam.eggs/setup.py. Directories are not a problem, but files will be happily overwritten. I think this is actually undefined behavior in distutils which trickles up to setuptools and pip, so pip could IMO close as wontfix.
What is the usual way to install Twisted plugins? Drop-it-here by hand?
I use this approach:
Put '.py' and '.pyc' versions of your file to "twisted/plugins/" folder inside your package.
Note that '.pyc' file can be empty, it just should exist.
In setup.py specify copying both files to a library folder (make sure that you will not overwrite existing plugins!). For example:
# setup.py
from distutils import sysconfig
LIB_PATH = sysconfig.get_python_lib()
# ...
plugin_name = '<your_package>/twisted/plugins/<plugin_name>'
# '.pyc' extension is necessary for correct plugins removing
data_files = [
(os.path.join(LIB_PATH, 'twisted', 'plugins'),
[''.join((plugin_name, extension)) for extension in ('.py', '.pyc')])
]
setup(
# ...
data_files=data_files
)
I'm trying to deploy OpenERP with a buildout and my own piece of code. In fact I would like to build a complete deployement structure allowing me to use OpenERP with custom modules and patch.
First of all, before adding any personnal configuration, I was trying to create a buildout which will have the responsability to configure everything.
Buildout Configuration
My buildout.cfg configuration file look like this:
[buildout]
parts = eggs
versions=versions
newest = false
extensions = lovely.buildouthttp
unzip = true
find-links =
http://download.gna.org/pychart/
[versions]
[eggs]
recipe = zc.recipe.egg
interpreter = python
eggs =
Paste
PasteScript
PasteDeploy
psycopg2
PyChart
pydot
openerp-server
Configuration problem
But when trying to launch the buildout I have a couples of errors when trying to install the last needed egg (openerp-server)
On my side it just cannot find these modules, but they are in my eggs dir:
Error: python module psycopg2 (PostgreSQL module) is required
Error: python module libxslt (libxslt python bindings) is required
Error: python module pychart (pychart module) is required
Error: python module pydot (pydot module) is required
error: Setup script exited with 1
An error occured when trying to install openerp-server 5.0.0-3. Look above this message for any errors that were output by easy_install.
Is this possible that openerp hardcoded the his searching path somewhere ?
easy_install, a try
I decided to give a try to a clean virtualenv without any relation to the main site-package. But when using easy_install on openerp-server:
$ source openerp-python/bin/activate
$ easy_install openerp-server
...
File "build/bdist.linux-i686/egg/pkg_resources.py", line 887, in extraction_error
pkg_resources.ExtractionError: Can't extract file(s) to egg cache
The following error occurred while trying to extract file(s) to the Python egg
cache:
SandboxViolation: mkdir('/home/mlhamel/.python-eggs/psycopg2-2.0.13-py2.5-linux-x86_64.egg-tmp', 511) {}
I have always the error message however psyopg2 was installed or not on my machine
System's Configuration
Ubuntu 9.10 x86-64
Tried on Python 2.5/Python 2.6
Ok I did this recently:
Don't try to install the egg, openerp is not really standard.
I used this buildout snippet:
# get the openerp-stuff as a distutils package
[openerp-server]
recipe = zerokspot.recipe.distutils
urls = http://www.openerp.com/download/stable/source/openerp-server-5.0.6.tar.gz
# similar idea for the web component
[openerp-web]
recipe = zc.recipe.egg:scripts
find-links = http://www.openerp.com/download/stable/source/openerp-web-5.0.6.tar.gz
# add some symlinks so you can run it out of bin
[server-symlinks]
recipe = cns.recipe.symlink
symlink = ${buildout:parts-directory}/openerp-server/bin/openerp-server = ${buildout:bin-directory}
The key however, is that I did not use virtualenv. You don't need to with buildout. Buildout + virtualenv is like Trojan + Ramses... one is enough, unless you are ... well one is enough. ;)
Now for this particular project I had followed the debian instructions and installed the required libs via aptitude. This was only because I was new to buildout at the time, one could just as easily install the psycopg2 module
Here are some excellent instructions. Ignore the django stuff if you don't need it. Dan Fairs is both a great writer and great buildout tutor. Check it out. Disclaimer: I am a disciple of the man, based on his buildout usage.
I am certain you do not want to use the egg on pypi, it never worked for me, openerp is not eggified, it's a distutils package.
Good luck!
Just for the record: there is a buildout recipe for OpenERP available in Pypi.
I'm not familiar with buildout, but if I were going to try building an OpenERP installer, I'd start by looking at the nice one from Open Source Consulting. I've used it and been pretty happy with it.
Last time I checked, it doesn't set up the CRM e-mail gateway, but everything else I need was covered.