Cython bdist_egg with setuptools creates invalid package - python

I am attempting to compile a *.pyx file. It uses some definitions and constants inside a __init__.py in the same directory. The project structure is:
setup.py
Foo/__init__.py
Foo/Foo.pyx
and the setup command is as follows:
from setuptools import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
import numpy
setup(
cmdclass = {'build_ext': build_ext},
ext_module = [ Extension(name='Foo', sources=['Foo/Foo.pyx']) ],
include_dirs=[numpy.get_include()],
name='Foo',
packages=['Foo'],
zip_safe=True
)
Problem arises when the egg is built and deployed. The resultant egg has the following structure:
Foo.so
Foo.py
Foo/__init__.py
Now, Foo.py contains some dynamic import code that basically imports the *.so file. However, because of the presence of Foo/__init__.py, import Foo attempts to import symbols only from __init__.py, which contains just some constants (all the relevant code is actually in Foo.so).
I've hacked around this issue by pasting all the definitions from __init__.py into Foo.pyx, but I'm trying to figure out what a proper solution might be.
Any advice is appreciated!

I tracked down my problem to an extraneous argument to the setup() command. Judging by the documentation at https://docs.python.org/2/distutils/setupscript.html, I do not need the packages=['Foo'] argument, and in fact that's what's causing it to create the inner Foo package that's messing everything up.

Related

Cannot Import F2PY Generated Module in __init__.py

I am Wrapping a fortran library with f2py, so far it has been successful in terms of getting the functions to work within python
I am at the point of structuring my package and am running into __init__.py issues when using these f2py generated modules. This is related to a package constructed with distutils extensions for f2py
After installing the egg, in an external script I can import the f2py generated module directly
from package.core.module1 import fortran_fn
,but when I place
from module1 import fortran_fn
in package/core/__init__.py in order to refactor the namespace for users,
the package import fails
----> 1 import package.core as core
~/venv/dev/lib/python3.5/site-packages/package-py3.5-linux-x86_64.egg/package/core/__init__.py in <module>()
1
2
----> 3 from module1 import fortran_fn
4
5
ImportError: No module named 'module1'
However, there is one exception, for whatever reason one module in particular, does wish to import itself.
Project Structure:
setup.py
package/
__init__.py
core/
__init__.py
module1.pyf
module2.pyf
...
src/
lib/
module1.f
module2.f
...
*pyf were generated with f2py -h package/core/moduleX.pyf src/lib/moduleX.f -m moduleX
Further,
#core/__init__.py
from module1 import fortran_fn
and,
#setup.py
from setuptools import find_packages
from numpy.distutils.core import Extension
ext1 = Extension(name='package.core.module1',
sources=['package/core/module1.pyf',
'src/lib/module1.f'])
#ext2 = ...
# and others
setup (name="package",
packages = find_packages(),
ext_modules = [ ext1 ]) # ..., ext2, ... ]
Code samples simplified to illustrate the problem, the specific code is available at this github
For some context, I initially tried to have f2py place the fortran into the .core subpackage directly by compiling with -m package.core and the same for name='package.core' for extension, however f2py would complain about missing names.
In this particular structure there is an additional layer where each fortran module is just one function, and I would like to remove the extra submodule layer from the API.
Using a fully qualified name in core/__init__.py will resolve the immediate import problem.
Instead of:
#core/__init__.py
from module1 import fortran_fn
This will work:
#core/__init__.py
from package.core.module1 import fortran_fn
then
> import package.core as c
> c.fortran_fn()
SUCCESS
It appears the f2py extensions do not register as subpackages/submodules when distutils runs. I may be mistaken in that interpretation.
In Python 3 all imports are absolute. To perform relative import do it explicitly:
from .module1 import fortran_fn

How to import functions from a submodule in a Python egg?

I have a custom Python egg I've written, which I cannot get the submodule (or nested module) to load externally. I've been able to load a root-level module just fine.
Here's the obfuscated structure:
my_egg/
my_egg/
__init__.py (empty file)
module_one.py
my_subdir\
__init__.py (empty file)
module_two.py
setup.py
Importing module_one works:
from my_egg import module_one
But I cannot seem to get module_two to import. Here's what I've tried:
from my_egg.my_subdir import module_two
from my_egg import my_subdir.module_two
from my_egg.my_subdir.module_two import *
None of those worked. Here's what my setup.py looks like:
from setuptools import setup
setup(name='my_egg',
version='0.1',
packages=['my_egg'],
test_suite='nose.collector',
tests_require=['nose'],
zip_safe=False)
I'm surpised no one answered this. I was able to get it working after scouring Google, pulling from different sources, and trying different things.
One thing which held me up... I was trying to install my custom egg on a Databricks cluster. And I didn't realize that once you delete a library (egg) the cluster must be restarted in order for it to be deleted. So every time I would try changes, nothing took effect. This definitely delayed my progress.
In any case, I changed to my setup.py file to use find_packages and made changes to the empty __init__.py files. I'm not really sure if both changes were needed, or if one would've sufficed.
New my_egg/setup.py:
exec(open('my_egg/version.py').read())
from setuptools import setup, find_packages
setup(name='my_egg',
version=__version__,
packages=find_packages(exclude=('tests', 'docs')),
test_suite='nose.collector',
tests_require=['nose'],
zip_safe=False)
I added a my_egg/version.py file to help me debug if I was using the right version on the cluster. That addition actually led me to discover that Databricks requires the cluster be restarted.
New root init my_egg/my_egg/__init__.py file:
from .version import __version__
from .module_one import module_one_func
from .my_subdir.module_two import module_two_func
__all__ = ['module_one_func']
New sub-dir init my_egg/my_egg/my_subdir/__init__.py:
from module_two import module_two_func
__all__ = ['module_two_func']

Package only binary compiled .so files of a python library compiled with Cython

I have a package named mypack which inside has a module mymod.py, and
the __init__.py.
For some reason that is not in debate, I need to package this module compiled
(nor .py or .pyc files are allowed). That is, the __init__.py is the only
source file allowed in the distributed compressed file.
The folder structure is:
.
│
├── mypack
│ ├── __init__.py
│ └── mymod.py
├── setup.py
I find that Cython is able to do this, by converting each .py file in a .so library
that can be directly imported with python.
The question is: how the setup.py file must be in order to allow an easy packaging and installation?
The target system has a virtualenv where the package must be installed with
whatever method that allows easy install and uninstall (easy_install, pip, etc are all
welcome).
I tried all that was at my reach. I read setuptools and distutils documentation,
all stackoverflow related questions,
and tried with all kind of commands (sdist, bdist, bdist_egg, etc), with lots
of combinations of setup.cfg and MANIFEST.in file entries.
The closest I got was with the below setup file, that would subclass the bdist_egg
command in order to remove also .pyc files, but that is breaking the installation.
A solution that installs "manually" the files in the venv is
also good, provided that all ancillary files that are included in a proper
installation are covered (I need to run pip freeze in the venv and see
mymod==0.0.1).
Run it with:
python setup.py bdist_egg --exclude-source-files
and (try to) install it with
easy_install mymod-0.0.1-py2.7-linux-x86_64.egg
As you may notice, the target is linux 64 bits with python 2.7.
from Cython.Distutils import build_ext
from setuptools import setup, find_packages
from setuptools.extension import Extension
from setuptools.command import bdist_egg
from setuptools.command.bdist_egg import walk_egg, log
import os
class my_bdist_egg(bdist_egg.bdist_egg):
def zap_pyfiles(self):
log.info("Removing .py files from temporary directory")
for base, dirs, files in walk_egg(self.bdist_dir):
for name in files:
if not name.endswith('__init__.py'):
if name.endswith('.py') or name.endswith('.pyc'):
# original 'if' only has name.endswith('.py')
path = os.path.join(base, name)
log.info("Deleting %s",path)
os.unlink(path)
ext_modules=[
Extension("mypack.mymod", ["mypack/mymod.py"]),
]
setup(
name = 'mypack',
cmdclass = {'build_ext': build_ext,
'bdist_egg': my_bdist_egg },
ext_modules = ext_modules,
version='0.0.1',
description='This is mypack compiled lib',
author='Myself',
packages=['mypack'],
)
UPDATE.
Following #Teyras answer, it was possible to build a wheel as requested in the answer. The setup.py file contents are:
import os
import shutil
from setuptools.extension import Extension
from setuptools import setup
from Cython.Build import cythonize
from Cython.Distutils import build_ext
class MyBuildExt(build_ext):
def run(self):
build_ext.run(self)
build_dir = os.path.realpath(self.build_lib)
root_dir = os.path.dirname(os.path.realpath(__file__))
target_dir = build_dir if not self.inplace else root_dir
self.copy_file('mypack/__init__.py', root_dir, target_dir)
def copy_file(self, path, source_dir, destination_dir):
if os.path.exists(os.path.join(source_dir, path)):
shutil.copyfile(os.path.join(source_dir, path),
os.path.join(destination_dir, path))
setup(
name = 'mypack',
cmdclass = {'build_ext': MyBuildExt},
ext_modules = cythonize([Extension("mypack.*", ["mypack/*.py"])]),
version='0.0.1',
description='This is mypack compiled lib',
author='Myself',
packages=[],
include_package_data=True )
The key point was to set packages=[],. The overwriting of the build_ext class run method was needed to get the __init__.py file inside the wheel.
Unfortunately, the answer suggesting setting packages=[] is wrong and may break a lot of stuff, as can e.g. be seen in this question. Don't use it. Instead of excluding all packages from the dist, you should exclude only the python files that will be cythonized and compiled to shared objects.
Below is a working example; it uses my recipe from the question Exclude single source file from python bdist_egg or bdist_wheel. The example project contains package spam with two modules, spam.eggs and spam.bacon, and a subpackage spam.fizz with one module spam.fizz.buzz:
root
├── setup.py
└── spam
├── __init__.py
├── bacon.py
├── eggs.py
└── fizz
├── __init__.py
└── buzz.py
The module lookup is being done in the build_py command, so it is the one you need to subclass with custom behaviour.
Simple case: compile all source code, make no exceptions
If you are about to compile every .py file (including __init__.pys), it is already sufficient to override build_py.build_packages method, making it a noop. Because build_packages doesn't do anything, no .py file will be collected at all and the dist will include only cythonized extensions:
import fnmatch
from setuptools import find_packages, setup, Extension
from setuptools.command.build_py import build_py as build_py_orig
from Cython.Build import cythonize
extensions = [
# example of extensions with regex
Extension('spam.*', ['spam/*.py']),
# example of extension with single source file
Extension('spam.fizz.buzz', ['spam/fizz/buzz.py']),
]
class build_py(build_py_orig):
def build_packages(self):
pass
setup(
name='...',
version='...',
packages=find_packages(),
ext_modules=cythonize(extensions),
cmdclass={'build_py': build_py},
)
Complex case: mix cythonized extensions with source modules
If you want to compile only selected modules and leave the rest untouched, you will need a bit more complex logic; in this case, you need to override module lookup. In the below example, I still compile spam.bacon, spam.eggs and spam.fizz.buzz to shared objects, but leave __init__.py files untouched, so they will be included as source modules:
import fnmatch
from setuptools import find_packages, setup, Extension
from setuptools.command.build_py import build_py as build_py_orig
from Cython.Build import cythonize
extensions = [
Extension('spam.*', ['spam/*.py']),
Extension('spam.fizz.buzz', ['spam/fizz/buzz.py']),
]
cython_excludes = ['**/__init__.py']
def not_cythonized(tup):
(package, module, filepath) = tup
return any(
fnmatch.fnmatchcase(filepath, pat=pattern) for pattern in cython_excludes
) or not any(
fnmatch.fnmatchcase(filepath, pat=pattern)
for ext in extensions
for pattern in ext.sources
)
class build_py(build_py_orig):
def find_modules(self):
modules = super().find_modules()
return list(filter(not_cythonized, modules))
def find_package_modules(self, package, package_dir):
modules = super().find_package_modules(package, package_dir)
return list(filter(not_cythonized, modules))
setup(
name='...',
version='...',
packages=find_packages(),
ext_modules=cythonize(extensions, exclude=cython_excludes),
cmdclass={'build_py': build_py},
)
While packaging as a wheel is definitely what you want, the original question was about excluding .py source files from the package. This is addressed in Using Cython to protect a Python codebase by #Teyras, but his solution uses a hack: it removes the packages argument from the call to setup(). This prevents the build_py step from running which does, indeed, exclude the .py files but it also excludes any data files you want included in the package. (For example my package has a data file called VERSION which contains the package version number.) A better solution would be replacing the build_py setup command with a custom command which only copies the data files.
You also need the __init__.py file as described above. So the custom build_py command should create the __init_.py file. I found that the compiled __init__.so runs when the package is imported so all that is needed is an empty __init__.py file to tell Python that the directory is a module which is ok to import.
Your custom build_py class would look like:
import os
from setuptools.command.build_py import build_py
class CustomBuildPyCommand(build_py):
def run(self):
# package data files but not .py files
build_py.build_package_data(self)
# create empty __init__.py in target dirs
for pdir in self.packages:
open(os.path.join(self.build_lib, pdir, '__init__.py'), 'a').close()
And configure setup to override the original build_py command:
setup(
...
cmdclass={'build_py': CustomBuildPyCommand},
)
I suggest you use the wheel format (as suggested by fish2000). Then, in your setup.py, set the packages argument to []. Your Cython extension will still build and the resulting .so files will be included in the resulting wheel package.
If your __init__.py is not included in the wheel, you can override the run method of build_ext class shipped by Cython and copy the file from your source tree to the build folder (the path can be found in self.build_lib).
This was exactly the sort of problem the Python wheels format – described in PEP 427 – was developed to address.
Wheels are a replacement for Python eggs (which were/are problematic for a bunch of reasons) – they are supported by pip, can contain architecture-specific private binaries (here is one example of such an arrangement) and are accepted generally by the Python communities who have stakes in these kind of things.
Here is one setup.py snippet from the aforelinked Python on Wheels article, showing how one sets up a binary distribution:
import os
from setuptools import setup
from setuptools.dist import Distribution
class BinaryDistribution(Distribution):
def is_pure(self):
return False
setup(
...,
include_package_data=True,
distclass=BinaryDistribution,
)
… in leu of the older (but probably somehow still canonically supported) setuptools classes you are using. It’s very straightforward to make Wheels for your distribution purposes, as outlined – as I recall from experience, either the wheel modules’ build process is somewhat cognizant of virtualenv, or it’s very easy to use one within the other.
In any case, trading in the setuptools egg-based APIs for wheel-based tooling should save you some serious pain, I should think.

Trouble importing extension compiled with numpy.distutils

I have a project directory structure:
myproject/
setup.py
myproject/
editors/
....
utilities/
...
find_inf.f90
All the files in the project are python, except for the one fortran file that i have indicated. Now, I can use setuptools to install my project without the fortran file just fine, but to include the fortran file i have to use numpy.distutils.core.Extension. So I have a setup files like this:
from setuptools import find_packages
from numpy.distutils.core import Extension
ext1 = Extension(name = 'myproject.find_inf',
sources = ['myproject/utilities/find_inf.f90'],
extra_f90_compile_args=['-fopenmp',],
libraries=['gomp'])
if __name__ == "__main__":
from numpy.distutils.core import setup
setup(name = 'myproject',
packages=find_packages(),
package_data={
......
},
entry_points={
'console_scripts': [....]
},
ext_modules = [ext1]
)
This creates and installs myproject-2.0-py2.7-macosx-10.6-x86_64.egg under the site-packages directory and the directory structure looks like:
myproject-2.0-py2.7-macosx-10.6-x86_64.egg
myproject
editors\
find_inf.pyc
find_inf.so.dSYM/
find_inf.py
find_inf.so*
__init__.py
__init__.pyc
So it looks to me that I should be able to import find_inf from myproject. But i can't! Writing from myproject import find_inf produces an import error. What am I doing wrong?
UPDATE:
If I chance the name of the extension from my project.find_inf to just find_inf then the installation puts the extension directly in myproject-2.0-py2.7-macosx-10.6-x86_64.egg. If then I manually move the find_inf files from there to myproject-2.0-py2.7-macosx-10.6-x86_64.egg/myproject then I can import the extension. I still can't make sense of this. Something is clearly wrong in my setup.py that it is not putting the extension in the right place....
UPDATE:
Figured it out. Answer below.
Okay, figured it out. The name= argument to Extension should have been name=myproject.utilities.find_inf.
Reason: So the issue is that there is no package named myproject.find_inf that setup() is aware of. The packages= argument to setup gets the names of the packages (as a list) and myproject.find_inf wasn't in the list because there is no directory under myproject called find_inf (as the directory structure in my question shows). This answer sheds important light on this issue. In order to have the compiled extensions placed under an appropriate sub-package, one needs:
those sub-packages to be present in the source directory structure
__init__.py files should exist in those packages, and
the names of those package have to be passed to the packages= argument in setup().
i.e. simply describing the extension hierarchy in the call to Extension() is not enough.

ImportError on subpackage when running setup.py test

I'm trying to create an install package for a Python project with included unit tests. My project layout is as follows:
setup.py
src/
disttest/
__init__.py
core.py
tests/
disttest/
__init__.py
testcore.py
My setup.py looks like this:
from distutils.core import setup
import setuptools
setup(name='disttest',
version='0.1',
package_dir={'': 'src'},
packages=setuptools.find_packages('src'),
test_suite='nose.collector',
tests_require=['Nose'],
)
The file tests/disttest/testcore.py contains the line from disttest.core import DistTestCore.
Running setup.py test now gives an ImportError: No module named core.
After a setup.py install, python -c "from disttest.core import DistTestCore" works fine. It also works if I put import core into src/disttest/__init__.py, but I don't really want to maintain that and it only seems necessary for the tests.
Why is that? And what is the correct way to fix it?
You may want to double-check this, but it looks like your tests are importing the disttest package in the tests/ directory, instead of the package-under-test from the src/ directory.
Why do you need to use a package with the same name as the package-under-test? I'd simply move the testcore module up to the tests directory, or rename the tests/disttest package and avoid the potential naming conflict altogether.
In any case, you want to insert a import pdb; pdb.set_trace() line just before the failing import and play around with different import statements to see what is being imported from where (import sys; sys.modules['modulename'].__file__ is your friend) so you get a better insight into what is going wrong.

Categories