Packaging a Python library

Packaging a Python library - python

I have a few Munin plugins which report stats from an Autonomy database. They all use a small library which scrapes the XML status output for the relevant numbers.
I'm trying to bundle the library and plugins into a Puppet-installable RPM. The actual RPM-building should be straightforward; once I have a distutils-produced distfile I can make it into an RPM based on a .spec file pinched from the Dag or EPEL repos [1]. It's the distutils bit I'm unsure of -- in fact I'm not even sure my library is correctly written for packaging. Here's how it works:
idol7stats.py:
import datetime
import os
import stat
import sys
import time
import urllib
import xml.sax
class IDOL7Stats:
cache_dir = '/tmp'
def __init__(self, host, port):
self.host = host
self.port = port
# ...
def collect(self):
self.data = self.__parseXML(self.__getXML())
def total_slots(self):
return self.data['Service:Documents:TotalSlots']
Plugin code:
from idol7stats import IDOL7Stats
a = IDOL7Stats('db.example.com', 23113)
a.collect()
print a.total_slots()
I guess I want idol7stats.py to wind up in /usr/lib/python2.4/site-packages/idol7stats, or something else in Python's search path. What distutils magic do I need? This:
from distutils.core import setup
setup(name = 'idol7stats',
author = 'Me',
author_email = 'me#example.com',
version = '0.1',
py_modules = ['idol7stats'])
almost works, except the code goes in /usr/lib/python2.4/site-packages/idol7stats.py, not a subdirectory. I expect this is down to my not understanding the difference between modules/packages/other containers in Python.
So, what's the rub?
[1] Yeah, I could just plonk the library in /usr/lib/python2.4/site-packages using RPM but I want to know how to package Python code.

You need to create a package to do what you want. You'd need a directory named idol7stats containing a file called __init__.py and any other library modules to package. Also, this will affect your scripts' imports; if you put idol7stats.py in a package called idol7stats, then your scripts need to "import idol7stats.idol7stats".
To avoid that, you could just rename idol7stats.py to idol7stats/__init__.py, or you could put this line into idol7stats/__init__.py to "massage" the imports into the way you expect them:
from idol7stats.idol7stats import *

Related

Can I add an include path in a distutils command?

I'm working on packaging a Python interface to a C library. The C library comes as a binary distribution tarball with headers and the compiled library. I want to make a bdist_wheel out of it, along with my built Python extensions, and the headers.
I've written a couple of distutils commands for extracting and installing the library like so:
from distutils.core import Command
from distutils.command.build import build
import os
import tarfile
class ExtractLibraryCommand(Command):
description = 'extract library from binary distribution'
def initialize_options(self):
self.build_lib = None
self.build_temp = None
self.library_dist = os.environ.get('LIBRARY_DIST')
def finalize_options(self):
self.set_undefined_options('build',
('build_lib', 'build_lib'),
('build_temp', 'build_temp'))
assert os.path.exists(self.library_dist), 'Library dist {} does not exist'.format(self.library_dist)
def run(self):
with tarfile.open(self.library_dist, 'r') as tf:
tf.extractall(path=self.build_temp)
class InstallLibraryCommand(Command):
description = 'install library from extracted distribution'
def initialize_options(self):
self.build_lib = None
self.build_temp = None
def finalize_options(self):
self.set_undefined_options('build',
('build_lib', 'build_lib'),
('build_temp', 'build_temp'))
def run(self):
self.copy_tree(
os.path.join(os.path.join(build_temp, 'my_library')),
os.path.join(self.build_lib, os.path.join('my_package', 'my_library'))
)
Then I override the build step to include my new commands.
class BuildCommand(build):
def run(self):
self.run_command('extract_library')
self.run_command('install_library')
build.run(self)
The problem is, I'm not sure how to get the path to the headers for the library to build my extensions, as they're installed to a directory specified by distutils.
from setuptools import setup, find_packages
from setuptools.extension import Extension
from Cython.Build import cythonize
extensions = [
Extension(
'package.library.*',
['package/library/*.pyx'],
include_dirs=???,
),
]
setup(
packages=find_packages(),
...
ext_modules=cythonize(extensions),
)
EDIT: To clarify, this is one setup.py script.

You can modify the extensions in the InstallLibraryCommand, after the library becomes available. I'd probably also move the extraction/installation code to finalize_options instead of run as installing the library in building stage is somewhat late in my opinion (makes the library unavailable in the configuration stage). Example:
class InstallLibraryCommand(Command):
def finalize_options(self):
...
with tarfile.open(self.library_dist, 'r') as tf:
tf.extractall(path=self.build_temp)
include_path = os.path.join(self.build_lib, os.path.join('my_package', 'my_library'))
self.copy_tree(
os.path.join(os.path.join(build_temp, 'my_library')),
include_path
)
for ext in self.distribution.ext_modules:
ext.include_dirs.append(include_path)

After deliberating on this problem a little more, I've decided to commit the library's headers with the Cython interface, as they're part of the interface, and required for building. This way, the code documents and uses a particular fixed version, and can be distributed with compatible binaries.

From folder_name import variable Python 3.4.2

File setup:
...\Project_Folder
...\Project_Folder\Project.py
...\Project_folder\Script\TestScript.py
I'm attempting to have Project.py import modules from the folder Script based on user input.
Python Version: 3.4.2
Ideally, the script would look something like
q = str(input("Input: "))
from Script import q
However, python does not recognize q as a variable when using import.
I've tried using importlib, however I cannot figure out how to import from the Script folder mentioned above.
import importlib
q = str(input("Input: "))
module = importlib.import_module(q, package=None)
I'm not certain where I would implement the file path.

Repeat of my answer originally posted at How to import a module given the full path?
as this is a Python 3.4 specific question:
This area of Python 3.4 seems to be extremely tortuous to understand, mainly because the documentation doesn't give good examples! This was my attempt using non-deprecated modules. It will import a module given the path to the .py file. I'm using it to load "plugins" at runtime.
def import_module_from_file(full_path_to_module):
"""
Import a module given the full path/filename of the .py file
Python 3.4
"""
module = None
try:
# Get module name and path from full path
module_dir, module_file = os.path.split(full_path_to_module)
module_name, module_ext = os.path.splitext(module_file)
# Get module "spec" from filename
spec = importlib.util.spec_from_file_location(module_name,full_path_to_module)
module = spec.loader.load_module()
except Exception as ec:
# Simple error printing
# Insert "sophisticated" stuff here
print(ec)
finally:
return module
# load module dynamically
path = "<enter your path here>"
module = import_module_from_file(path)
# Now use the module
# e.g. module.myFunction()

I did this by defining the entire import line as a string, formatting the string with q and then using the exec command:
imp = 'from Script import %s' %q
exec imp

py2app picking up .git subdir of a package during build

We use py2app extensively at our facility to produce self contained .app packages for easy internal deployment without dependency issues. Something I noticed recently, and have no idea how it began, is that when building an .app, py2app started including the .git directory of our main library.
commonLib, for instance, is our root python library package, which is a git repo. Under this package are the various subpackages such as database, utility, etc.
commonLib/
|- .git/ # because commonLib is a git repo
|- __init__.py
|- database/
|- __init__.py
|- utility/
|- __init__.py
# ... etc
In a given project, say Foo, we will do imports like from commonLib import xyz to use our common packages. Building via py2app looks something like: python setup.py py2app
So the recent issue I am seeing is that when building an app for project Foo, I will see it include everything in commonLib/.git/ into the app, which is extra bloat. py2app has an excludes option but that only seems to be for python modules. I cant quite figure out what it would take to exclude the .git subdir, or in fact, what is causing it to be included in the first place.
Has anyone experienced this when using a python package import that is a git repo?
Nothing has changed in our setup.py files for each project, and commonLib has always been a git repo. So the only thing I can think of being a variable is the version of py2app and its deps which have obviously been upgraded over time.
Edit
I'm using the latest py2app 0.6.4 as of right now. Also, my setup.py was first generated from py2applet a while back, but has been hand configured since and copied over as a template for every new project. I am using PyQt4/sip for every single one of these projects, so it also makes me wonder if its an issue with one of the recipes?
Update
From the first answer, I tried to fix this using various combinations of exclude_package_data settings. Nothing seems to force the .git directory to become excluded. Here is a sample of what my setup.py files generally look like:
from setuptools import setup
from myApp import VERSION
appname = 'MyApp'
APP = ['myApp.py']
DATA_FILES = []
OPTIONS = {
'includes': 'atexit, sip, PyQt4.QtCore, PyQt4.QtGui',
'strip': True,
'iconfile':'ui/myApp.icns',
'resources':['src/myApp.png'],
'plist':{
'CFBundleIconFile':'ui/myApp.icns',
'CFBundleIdentifier':'com.company.myApp',
'CFBundleGetInfoString': appname,
'CFBundleVersion' : VERSION,
'CFBundleShortVersionString' : VERSION
}
}
setup(
app=APP,
data_files=DATA_FILES,
options={'py2app': OPTIONS},
setup_requires=['py2app'],
)
I have tried things like:
setup(
...
exclude_package_data = { 'commonLib': ['.git'] },
#exclude_package_data = { '': ['.git'] },
#exclude_package_data = { 'commonLib/.git/': ['*'] },
#exclude_package_data = { '.git': ['*'] },
...
)
Update #2
I have posted my own answer which does a monkeypatch on distutils. Its ugly and not preferred, but until someone can offer me a better solution, I guess this is what I have.

I am adding an answer to my own question, to document the only thing I have found to work thus far. My approach was to monkeypatch distutils to ignore certain patterns when creating a directory or copying a file. This is really not what I wanted to do, but like I said, its the only thing that works so far.
## setup.py ##
import re
# file_util has to come first because dir_util uses it
from distutils import file_util, dir_util
def wrapper(fn):
def wrapped(src, *args, **kwargs):
if not re.search(r'/\.git/?', src):
fn(src, *args, **kwargs)
return wrapped
file_util.copy_file = wrapper(file_util.copy_file)
dir_util.mkpath = wrapper(dir_util.mkpath)
# now import setuptools so it uses the monkeypatched methods
from setuptools import setup
Hopefully someone will comment on this and tell me a higher level approach to avoid doing this. But as of now, I will probably wrap this into a utility method like exclude_data_patterns(re_pattern) to be reused in my projects.

I can see two options for excluding the .git directory.
Build the application from a 'clean' checkout of the code. When deploying a new version, we always build from a fresh svn export based on a tag to ensure we don't pick up spurious changes/files. You could try the equivalent here - although the git equivalent seems somewhat more involved.
Modify the setup.py file to massage the files included in the application. This might be done using the exclude_package_data functionality as described in the docs, or build the list of data_files and pass it to setup.
As for why it has suddenly started happening, knowing the version of py2app you are using might help, as will knowing the contents of your setup.py and perhaps how this was made (by hand or using py2applet).

I have a similar experience with Pyinstaller, so I'm not sure it applies directly.
Pyinstaller creates a "manifest" of all files to be included in the distribution, before running the export process. You could "massage" this manifest, as per Mark's second suggestion, to exclude any files you want. Including anything within .git or .git itself.
In the end, I stuck with checking out my code before producing a binary as there was more than just .git being bloat (such as UML documents and raw resource files for Qt). A checkout guaranteed a clean result and I experienced no issues automating that process along with the process of creating the installer for the binary.

There is a good answer to this, but I have a more elaborate answer to solve the problem mentioned here with a white-list approach. To have the monkey patch also work for packages outside site-packages.zip I had to monkey patch also copy_tree (because it imports copy_file inside its function), this helps in making a standalone application.
In addition, I create a white-list recipe to mark certain packages zip-unsafe. The approach makes it easy to add filters other than white-list.
import pkgutil
from os.path import join, dirname, realpath
from distutils import log
# file_util has to come first because dir_util uses it
from distutils import file_util, dir_util
# noinspection PyUnresolvedReferences
from py2app import util
def keep_only_filter(base_mod, sub_mods):
prefix = join(realpath(dirname(base_mod.filename)), '')
all_prefix = [join(prefix, sm) for sm in sub_mods]
log.info("Set filter for prefix %s" % prefix)
def wrapped(mod):
name = getattr(mod, 'filename', None)
if name is None:
# ignore anything that does not have file name
return True
name = join(realpath(dirname(name)), '')
if not name.startswith(prefix):
# ignore those that are not in this prefix
return True
for p in all_prefix:
if name.startswith(p):
return True
# log.info('ignoring %s' % name)
return False
return wrapped
# define all the filters we need
all_filts = {
'mypackage': (keep_only_filter, [
'subpackage1', 'subpackage2',
]),
}
def keep_only_wrapper(fn, is_dir=False):
filts = [(f, k[1]) for (f, k) in all_filts.iteritems()
if k[0] == keep_only_filter]
prefixes = {}
for f, sms in filts:
pkg = pkgutil.get_loader(f)
assert pkg, '{f} package not found'.format(f=f)
p = join(pkg.filename, '')
sp = [join(p, sm, '') for sm in sms]
prefixes[p] = sp
def wrapped(src, *args, **kwargs):
name = src
if not is_dir:
name = dirname(src)
name = join(realpath(name), '')
keep = True
for prefix, sub_prefixes in prefixes.iteritems():
if name == prefix:
# let the root pass
continue
# if it is a package we have a filter for
if name.startswith(prefix):
keep = False
for sub_prefix in sub_prefixes:
if name.startswith(sub_prefix):
keep = True
break
if keep:
return fn(src, *args, **kwargs)
return []
return wrapped
file_util.copy_file = keep_only_wrapper(file_util.copy_file)
dir_util.mkpath = keep_only_wrapper(dir_util.mkpath, is_dir=True)
util.copy_tree = keep_only_wrapper(util.copy_tree, is_dir=True)
class ZipUnsafe(object):
def __init__(self, _module, _filt):
self.module = _module
self.filt = _filt
def check(self, dist, mf):
m = mf.findNode(self.module)
if m is None:
return None
# Do not put this package in site-packages.zip
if self.filt:
return dict(
packages=[self.module],
filters=[self.filt[0](m, self.filt[1])],
)
return dict(
packages=[self.module]
)
# Any package that is zip-unsafe (uses __file__ ,... ) should be added here
# noinspection PyUnresolvedReferences
import py2app.recipes
for module in [
'sklearn', 'mypackage',
]:
filt = all_filts.get(module)
setattr(py2app.recipes, module, ZipUnsafe(module, filt))

Hiding implementation files in a package

I have a module called spellnum. It can be used as a command-line utility (it has the if __name__ == '__main__': block) or it can be imported like a standard Python module.
The module defines a class named Speller which looks like this:
class Speller(object):
def __init__(self, lang="en"):
module = __import__("spelling_" + lang)
# use module's contents...
As you can see, the class constructor loads other modules at runtime. Those modules (spelling_en.py, spelling_es.py, etc.) are located in the same directory as the spellnum.py itself.
Besides spellnum.py, there are other files with utility functions and classes. I'd like to hide those files since I don't want to expose them to the user and since it's a bad idea to pollute the Python's lib directory with random files. The only way to achieve this that I know of is to create a package.
I've come up with this layout for the project (inspired by this great tutorial):
spellnum/ # project root
spellnum/ # package root
__init__.py
spellnum.py
spelling_en.py
spelling_es.py
squash.py
# ... some other private files
test/
test_spellnum.py
example.py
The file __init__.py contains a single line:
from spellnum import Speller
Given this new layout, the code for dynamic module loading had to be changed:
class Speller(object):
def __init__(self, lang="en"):
spelling_mod = "spelling_" + lang
package = __import__("spellnum", fromlist=[spelling_mod])
module = getattr(package, spelling_mod)
# use module as usual
So, with this project layout a can do the following:
Successfully import spellnum inside example.py and use it like a simple module:
# an excerpt from the example.py file
import spellnum
speller = spellnum.Speller(es)
# ...
import spellnum in the tests and run those tests from the project root like this:
$ PYTHONPATH="`pwd`:$PYTHONPATH" python test/test_spellnum.py
The problem
I cannot execute spellnum.py directly with the new layout. When I try to, it shows the following error:
Traceback (most recent call last):
...
File "spellnum/spellnum.py", line 23, in __init__
module = getattr(package, spelling_mod)
AttributeError: 'module' object has no attribute 'spelling_en'
The question
What's the best way to organize all of the files required by my module to work so that users are able to use the module both from command line and from their Python code?
Thanks!

How about keeping spellnum.py?
spellnum.py
spelling/
__init__.py
en.py
es.py

Your problem is, that the package is called the same as the python-file you want to execute, thus importing
from spellnum import spellnum_en
will try to import from the file instead of the package. You could fiddle around with relative imports, but I don't know how to make them work with __import__, so I'd suggest the following:
def __init__(self, lang="en"):
mod = "spellnum_" + lang
module = None
if __name__ == '__main__':
module = __import__(mod)
else:
package = getattr(__import__("spellnum", fromlist=[mod]), mod)

Is there a standard way to list names of Python modules in a package?

Is there a straightforward way to list the names of all modules in a package, without using __all__?
For example, given this package:
/testpkg
/testpkg/__init__.py
/testpkg/modulea.py
/testpkg/moduleb.py
I'm wondering if there is a standard or built-in way to do something like this:
>>> package_contents("testpkg")
['modulea', 'moduleb']
The manual approach would be to iterate through the module search paths in order to find the package's directory. One could then list all the files in that directory, filter out the uniquely-named py/pyc/pyo files, strip the extensions, and return that list. But this seems like a fair amount of work for something the module import mechanism is already doing internally. Is that functionality exposed anywhere?

Using python2.3 and above, you could also use the pkgutil module:
>>> import pkgutil
>>> [name for _, name, _ in pkgutil.iter_modules(['testpkg'])]
['modulea', 'moduleb']
EDIT: Note that the parameter for pkgutil.iter_modules is not a list of modules, but a list of paths, so you might want to do something like this:
>>> import os.path, pkgutil
>>> import testpkg
>>> pkgpath = os.path.dirname(testpkg.__file__)
>>> print([name for _, name, _ in pkgutil.iter_modules([pkgpath])])

import module
help(module)

Maybe this will do what you're looking for?
import imp
import os
MODULE_EXTENSIONS = ('.py', '.pyc', '.pyo')
def package_contents(package_name):
file, pathname, description = imp.find_module(package_name)
if file:
raise ImportError('Not a package: %r', package_name)
# Use a set because some may be both source and compiled.
return set([os.path.splitext(module)[0]
for module in os.listdir(pathname)
if module.endswith(MODULE_EXTENSIONS)])

Don't know if I'm overlooking something, or if the answers are just out-dated but;
As stated by user815423426 this only works for live objects and the listed modules are only modules that were imported before.
Listing modules in a package seems really easy using inspect:
>>> import inspect, testpkg
>>> inspect.getmembers(testpkg, inspect.ismodule)
['modulea', 'moduleb']

This is a recursive version that works with python 3.6 and above:
import importlib.util
from pathlib import Path
import os
MODULE_EXTENSIONS = '.py'
def package_contents(package_name):
spec = importlib.util.find_spec(package_name)
if spec is None:
return set()
pathname = Path(spec.origin).parent
ret = set()
with os.scandir(pathname) as entries:
for entry in entries:
if entry.name.startswith('__'):
continue
current = '.'.join((package_name, entry.name.partition('.')[0]))
if entry.is_file():
if entry.name.endswith(MODULE_EXTENSIONS):
ret.add(current)
elif entry.is_dir():
ret.add(current)
ret |= package_contents(current)
return ret

There is a __loader__ variable inside each package instance. So, if you import the package, you can find the "module resources" inside the package:
import testpkg # change this by your package name
for mod in testpkg.__loader__.get_resource_reader().contents():
print(mod)
You can of course improve the loop to find the "module" name:
import testpkg
from pathlib import Path
for mod in testpkg.__loader__.get_resource_reader().contents():
# You can filter the name like
# Path(l).suffix not in (".py", ".pyc")
print(Path(mod).stem)
Inside the package, you can find your modules by directly using __loader__ of course.

This should list the modules:
help("modules")

If you would like to view an inforamtion about your package outside of the python code (from a command prompt) you can use pydoc for it.
# get a full list of packages that you have installed on you machine
$ python -m pydoc modules
# get information about a specific package
$ python -m pydoc <your package>
You will have the same result as pydoc but inside of interpreter using help
>>> import <my package>
>>> help(<my package>)

Based on cdleary's example, here's a recursive version listing path for all submodules:
import imp, os
def iter_submodules(package):
file, pathname, description = imp.find_module(package)
for dirpath, _, filenames in os.walk(pathname):
for filename in filenames:
if os.path.splitext(filename)[1] == ".py":
yield os.path.join(dirpath, filename)

The other answers here will run the code in the package as they inspect it. If you don't want that, you can grep the files like this answer
def _get_class_names(file_name: str) -> List[str]:
"""Get the python class name defined in a file without running code
file_name: the name of the file to search for class definitions in
return: all the classes defined in that python file, empty list if no matches"""
defined_class_names = []
# search the file for class definitions
with open(file_name, "r") as file:
for line in file:
# regular expression for class defined in the file
# searches for text that starts with "class" and ends with ( or :,
# whichever comes first
match = re.search("^class(.+?)(\(|:)", line) # noqa
if match:
# add the cleaned match to the list if there is one
defined_class_name = match.group(1).strip()
defined_class_names.append(defined_class_name)
return defined_class_names

To complete #Metal3d answer, yes you can do testpkg.__loader__.get_resource_reader().contents() to list the "module resources" but it will work only if you imported your package in the "normal" way and your loader is _frozen_importlib_external.SourceFileLoader object.
But if you imported your library with zipimport (ex: to load your package in memory), your loader will be a zipimporter object, and its get_resource_reader function is different from importlib; it will require a "fullname" argument.
To make it work in these two loaders, just specify your package name in argument to get_resource_reader :
# An example with CrackMapExec tool
import importlib
import cme.protocols as cme_protocols
class ProtocolLoader:
def get_protocols(self):
protocols = {}
protocols_names = [x for x in cme_protocols.__loader__.get_resource_reader("cme.protocols").contents()]
for prot_name in protocols_names:
prot = importlib.import_module(f"cme.protocols.{prot_name}")
protocols[prot_name] = prot
return protocols

def package_contents(package_name):
package = __import__(package_name)
return [module_name for module_name in dir(package) if not module_name.startswith("__")]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Packaging a Python library - python

Related

Can I add an include path in a distutils command?

From folder_name import variable Python 3.4.2

py2app picking up .git subdir of a package during build

Hiding implementation files in a package

Is there a standard way to list names of Python modules in a package?

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Packaging a Python library - python

Related

Can I add an include path in a distutils command?

From *folder_name* import *variable* Python 3.4.2

py2app picking up .git subdir of a package during build

Hiding implementation files in a package

Is there a standard way to list names of Python modules in a package?

Categories

Resources

From folder_name import variable Python 3.4.2