I'm working on packaging a Python interface to a C library. The C library comes as a binary distribution tarball with headers and the compiled library. I want to make a bdist_wheel out of it, along with my built Python extensions, and the headers.
I've written a couple of distutils commands for extracting and installing the library like so:
from distutils.core import Command
from distutils.command.build import build
import os
import tarfile
class ExtractLibraryCommand(Command):
description = 'extract library from binary distribution'
def initialize_options(self):
self.build_lib = None
self.build_temp = None
self.library_dist = os.environ.get('LIBRARY_DIST')
def finalize_options(self):
self.set_undefined_options('build',
('build_lib', 'build_lib'),
('build_temp', 'build_temp'))
assert os.path.exists(self.library_dist), 'Library dist {} does not exist'.format(self.library_dist)
def run(self):
with tarfile.open(self.library_dist, 'r') as tf:
tf.extractall(path=self.build_temp)
class InstallLibraryCommand(Command):
description = 'install library from extracted distribution'
def initialize_options(self):
self.build_lib = None
self.build_temp = None
def finalize_options(self):
self.set_undefined_options('build',
('build_lib', 'build_lib'),
('build_temp', 'build_temp'))
def run(self):
self.copy_tree(
os.path.join(os.path.join(build_temp, 'my_library')),
os.path.join(self.build_lib, os.path.join('my_package', 'my_library'))
)
Then I override the build step to include my new commands.
class BuildCommand(build):
def run(self):
self.run_command('extract_library')
self.run_command('install_library')
build.run(self)
The problem is, I'm not sure how to get the path to the headers for the library to build my extensions, as they're installed to a directory specified by distutils.
from setuptools import setup, find_packages
from setuptools.extension import Extension
from Cython.Build import cythonize
extensions = [
Extension(
'package.library.*',
['package/library/*.pyx'],
include_dirs=???,
),
]
setup(
packages=find_packages(),
...
ext_modules=cythonize(extensions),
)
EDIT: To clarify, this is one setup.py script.
You can modify the extensions in the InstallLibraryCommand, after the library becomes available. I'd probably also move the extraction/installation code to finalize_options instead of run as installing the library in building stage is somewhat late in my opinion (makes the library unavailable in the configuration stage). Example:
class InstallLibraryCommand(Command):
def finalize_options(self):
...
with tarfile.open(self.library_dist, 'r') as tf:
tf.extractall(path=self.build_temp)
include_path = os.path.join(self.build_lib, os.path.join('my_package', 'my_library'))
self.copy_tree(
os.path.join(os.path.join(build_temp, 'my_library')),
include_path
)
for ext in self.distribution.ext_modules:
ext.include_dirs.append(include_path)
After deliberating on this problem a little more, I've decided to commit the library's headers with the Cython interface, as they're part of the interface, and required for building. This way, the code documents and uses a particular fixed version, and can be distributed with compatible binaries.
I wrote my project in python and now I want to pack it into python package like here : Python packaging tutorial. The thing is I don't know how to properly pack my main.py ( I named it __ init __.py same like in tutorial) where the arg parse is being used. My file structure currently looks like this:
information_analyzer/
information_analyzer/
__init__.py
ArticleFile.py
ArticleInspector.py
SearchEngine.py
setup.py
.gitignore
README.md
Here is my setup.py:
from setuptools import setup
setup(name='information_analyzer',
version='0.3',
description='Phrase and article analyzer',
url='https://github.com/my_repo_link',
author='Name Surname',
author_email='mail#gmail.com',
license='MIT',
packages=['information_analyzer'],
include_package_data=True,
zip_safe=False)
Here is my __ init __.py:
import argparse
import sys
from information_analyzer.information_analyzer.ResultsChecker import check_result
from information_analyzer.information_analyzer.SearchEngine import SearchEngine
from information_analyzer.information_analyzer.ArticleInspector import ArticleInspector
def main():
parser = argparse.ArgumentParser(description='Information analyzer')
parser.add_argument('--phrase', default=None, type=str, help='Phrase for verification')
parser.add_argument('--articleURL', default=None, type=str, help='URL to article')
args = parser.parse_args()
articleURL=args.articleURL # for example http://fedsalert.com/donald-trump-dead-from-a-fatal-heart-attack/
phrase=args.phrase
if(phrase is None and articleURL is None):
print("You must specify phrase or article URL to run the program.")
sys.exit()
if phrase is not None:
search_engine = SearchEngine()
check_result(phrase, search_engine)
if articleURL is not None:
article_inspector=ArticleInspector(articleURL)
article_inspector.print_all_informations()
When I installed pip install . and tried to import it in python console I got that info:
>>> import information_analyzer
>>> information_analyzer.main()
AttributeError: module 'information-analyzer' has no attribute 'main'
Here is the current state of my twistd plugin, which is located in project_root/twisted/plugins/my_plugin.py:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from zope.interface import implements
from twisted.plugin import IPlugin
from twisted.python.usage import Options
from twisted.application import internet, service
from mylib.io import MyFactory
class Options(Options):
"""Flags and options for the plugin."""
optParameters = [
('sock', 's', '/tmp/io.sock', 'Path to IO socket'),
]
class MyServiceMaker(object):
implements(service.IServiceMaker, IPlugin)
tapname = "myplugin"
description = "description for my plugin"
options = Options
def makeService(self, options):
return internet.UNIXServer(options['sock'], MyFactory())
There is no __init__.py file in project_root/twisted/plugins/
The output of twistd doesn't show my plugin when run from the project's root directory
I installed my library via python setup.py develop --user and it is importable from anywhere
Any ideas?
As suspected, it was something very simple: I needed to instantiate an instance of MyServiceMaker, so simply adding service_maker = MyServiceMaker() at the bottom of the script fixes the issue.
We use py2app extensively at our facility to produce self contained .app packages for easy internal deployment without dependency issues. Something I noticed recently, and have no idea how it began, is that when building an .app, py2app started including the .git directory of our main library.
commonLib, for instance, is our root python library package, which is a git repo. Under this package are the various subpackages such as database, utility, etc.
commonLib/
|- .git/ # because commonLib is a git repo
|- __init__.py
|- database/
|- __init__.py
|- utility/
|- __init__.py
# ... etc
In a given project, say Foo, we will do imports like from commonLib import xyz to use our common packages. Building via py2app looks something like: python setup.py py2app
So the recent issue I am seeing is that when building an app for project Foo, I will see it include everything in commonLib/.git/ into the app, which is extra bloat. py2app has an excludes option but that only seems to be for python modules. I cant quite figure out what it would take to exclude the .git subdir, or in fact, what is causing it to be included in the first place.
Has anyone experienced this when using a python package import that is a git repo?
Nothing has changed in our setup.py files for each project, and commonLib has always been a git repo. So the only thing I can think of being a variable is the version of py2app and its deps which have obviously been upgraded over time.
Edit
I'm using the latest py2app 0.6.4 as of right now. Also, my setup.py was first generated from py2applet a while back, but has been hand configured since and copied over as a template for every new project. I am using PyQt4/sip for every single one of these projects, so it also makes me wonder if its an issue with one of the recipes?
Update
From the first answer, I tried to fix this using various combinations of exclude_package_data settings. Nothing seems to force the .git directory to become excluded. Here is a sample of what my setup.py files generally look like:
from setuptools import setup
from myApp import VERSION
appname = 'MyApp'
APP = ['myApp.py']
DATA_FILES = []
OPTIONS = {
'includes': 'atexit, sip, PyQt4.QtCore, PyQt4.QtGui',
'strip': True,
'iconfile':'ui/myApp.icns',
'resources':['src/myApp.png'],
'plist':{
'CFBundleIconFile':'ui/myApp.icns',
'CFBundleIdentifier':'com.company.myApp',
'CFBundleGetInfoString': appname,
'CFBundleVersion' : VERSION,
'CFBundleShortVersionString' : VERSION
}
}
setup(
app=APP,
data_files=DATA_FILES,
options={'py2app': OPTIONS},
setup_requires=['py2app'],
)
I have tried things like:
setup(
...
exclude_package_data = { 'commonLib': ['.git'] },
#exclude_package_data = { '': ['.git'] },
#exclude_package_data = { 'commonLib/.git/': ['*'] },
#exclude_package_data = { '.git': ['*'] },
...
)
Update #2
I have posted my own answer which does a monkeypatch on distutils. Its ugly and not preferred, but until someone can offer me a better solution, I guess this is what I have.
I am adding an answer to my own question, to document the only thing I have found to work thus far. My approach was to monkeypatch distutils to ignore certain patterns when creating a directory or copying a file. This is really not what I wanted to do, but like I said, its the only thing that works so far.
## setup.py ##
import re
# file_util has to come first because dir_util uses it
from distutils import file_util, dir_util
def wrapper(fn):
def wrapped(src, *args, **kwargs):
if not re.search(r'/\.git/?', src):
fn(src, *args, **kwargs)
return wrapped
file_util.copy_file = wrapper(file_util.copy_file)
dir_util.mkpath = wrapper(dir_util.mkpath)
# now import setuptools so it uses the monkeypatched methods
from setuptools import setup
Hopefully someone will comment on this and tell me a higher level approach to avoid doing this. But as of now, I will probably wrap this into a utility method like exclude_data_patterns(re_pattern) to be reused in my projects.
I can see two options for excluding the .git directory.
Build the application from a 'clean' checkout of the code. When deploying a new version, we always build from a fresh svn export based on a tag to ensure we don't pick up spurious changes/files. You could try the equivalent here - although the git equivalent seems somewhat more involved.
Modify the setup.py file to massage the files included in the application. This might be done using the exclude_package_data functionality as described in the docs, or build the list of data_files and pass it to setup.
As for why it has suddenly started happening, knowing the version of py2app you are using might help, as will knowing the contents of your setup.py and perhaps how this was made (by hand or using py2applet).
I have a similar experience with Pyinstaller, so I'm not sure it applies directly.
Pyinstaller creates a "manifest" of all files to be included in the distribution, before running the export process. You could "massage" this manifest, as per Mark's second suggestion, to exclude any files you want. Including anything within .git or .git itself.
In the end, I stuck with checking out my code before producing a binary as there was more than just .git being bloat (such as UML documents and raw resource files for Qt). A checkout guaranteed a clean result and I experienced no issues automating that process along with the process of creating the installer for the binary.
There is a good answer to this, but I have a more elaborate answer to solve the problem mentioned here with a white-list approach. To have the monkey patch also work for packages outside site-packages.zip I had to monkey patch also copy_tree (because it imports copy_file inside its function), this helps in making a standalone application.
In addition, I create a white-list recipe to mark certain packages zip-unsafe. The approach makes it easy to add filters other than white-list.
import pkgutil
from os.path import join, dirname, realpath
from distutils import log
# file_util has to come first because dir_util uses it
from distutils import file_util, dir_util
# noinspection PyUnresolvedReferences
from py2app import util
def keep_only_filter(base_mod, sub_mods):
prefix = join(realpath(dirname(base_mod.filename)), '')
all_prefix = [join(prefix, sm) for sm in sub_mods]
log.info("Set filter for prefix %s" % prefix)
def wrapped(mod):
name = getattr(mod, 'filename', None)
if name is None:
# ignore anything that does not have file name
return True
name = join(realpath(dirname(name)), '')
if not name.startswith(prefix):
# ignore those that are not in this prefix
return True
for p in all_prefix:
if name.startswith(p):
return True
# log.info('ignoring %s' % name)
return False
return wrapped
# define all the filters we need
all_filts = {
'mypackage': (keep_only_filter, [
'subpackage1', 'subpackage2',
]),
}
def keep_only_wrapper(fn, is_dir=False):
filts = [(f, k[1]) for (f, k) in all_filts.iteritems()
if k[0] == keep_only_filter]
prefixes = {}
for f, sms in filts:
pkg = pkgutil.get_loader(f)
assert pkg, '{f} package not found'.format(f=f)
p = join(pkg.filename, '')
sp = [join(p, sm, '') for sm in sms]
prefixes[p] = sp
def wrapped(src, *args, **kwargs):
name = src
if not is_dir:
name = dirname(src)
name = join(realpath(name), '')
keep = True
for prefix, sub_prefixes in prefixes.iteritems():
if name == prefix:
# let the root pass
continue
# if it is a package we have a filter for
if name.startswith(prefix):
keep = False
for sub_prefix in sub_prefixes:
if name.startswith(sub_prefix):
keep = True
break
if keep:
return fn(src, *args, **kwargs)
return []
return wrapped
file_util.copy_file = keep_only_wrapper(file_util.copy_file)
dir_util.mkpath = keep_only_wrapper(dir_util.mkpath, is_dir=True)
util.copy_tree = keep_only_wrapper(util.copy_tree, is_dir=True)
class ZipUnsafe(object):
def __init__(self, _module, _filt):
self.module = _module
self.filt = _filt
def check(self, dist, mf):
m = mf.findNode(self.module)
if m is None:
return None
# Do not put this package in site-packages.zip
if self.filt:
return dict(
packages=[self.module],
filters=[self.filt[0](m, self.filt[1])],
)
return dict(
packages=[self.module]
)
# Any package that is zip-unsafe (uses __file__ ,... ) should be added here
# noinspection PyUnresolvedReferences
import py2app.recipes
for module in [
'sklearn', 'mypackage',
]:
filt = all_filts.get(module)
setattr(py2app.recipes, module, ZipUnsafe(module, filt))
I have a few Munin plugins which report stats from an Autonomy database. They all use a small library which scrapes the XML status output for the relevant numbers.
I'm trying to bundle the library and plugins into a Puppet-installable RPM. The actual RPM-building should be straightforward; once I have a distutils-produced distfile I can make it into an RPM based on a .spec file pinched from the Dag or EPEL repos [1]. It's the distutils bit I'm unsure of -- in fact I'm not even sure my library is correctly written for packaging. Here's how it works:
idol7stats.py:
import datetime
import os
import stat
import sys
import time
import urllib
import xml.sax
class IDOL7Stats:
cache_dir = '/tmp'
def __init__(self, host, port):
self.host = host
self.port = port
# ...
def collect(self):
self.data = self.__parseXML(self.__getXML())
def total_slots(self):
return self.data['Service:Documents:TotalSlots']
Plugin code:
from idol7stats import IDOL7Stats
a = IDOL7Stats('db.example.com', 23113)
a.collect()
print a.total_slots()
I guess I want idol7stats.py to wind up in /usr/lib/python2.4/site-packages/idol7stats, or something else in Python's search path. What distutils magic do I need? This:
from distutils.core import setup
setup(name = 'idol7stats',
author = 'Me',
author_email = 'me#example.com',
version = '0.1',
py_modules = ['idol7stats'])
almost works, except the code goes in /usr/lib/python2.4/site-packages/idol7stats.py, not a subdirectory. I expect this is down to my not understanding the difference between modules/packages/other containers in Python.
So, what's the rub?
[1] Yeah, I could just plonk the library in /usr/lib/python2.4/site-packages using RPM but I want to know how to package Python code.
You need to create a package to do what you want. You'd need a directory named idol7stats containing a file called __init__.py and any other library modules to package. Also, this will affect your scripts' imports; if you put idol7stats.py in a package called idol7stats, then your scripts need to "import idol7stats.idol7stats".
To avoid that, you could just rename idol7stats.py to idol7stats/__init__.py, or you could put this line into idol7stats/__init__.py to "massage" the imports into the way you expect them:
from idol7stats.idol7stats import *