Python Import cannot find shared object file (C++ extension) - python

I'm creating a C++ extension module for Python3. Compilation of the setup.py file compiles just fine, but when I go to import my new module, I get
ImportError: libMyLib.so: cannot open shared object file: No such file or directory
this is the path to my .so:
/path/to/lib-cc7/libMyLib.so
I've tried to import my libraries in the setup.py in different ways, I have tried setting and re-setting my LD_LIBRARY_PATH variable in the terminal as well as in my .bash_profile. I have tried setting the paths in sys.path.
When I run this code before the import statement:
print(os.environ.get("LD_LIBRARY_PATH"))
print(os.environ.get("PYTHONPATH"))
I get the path to the correct library directory.
When I run strace the path to other .so's that I need show up, and I see it searching for libMyLib.so, but it just searches in what seems like all of the other directories and /path/to/lib-cc7/. In other library searches it checks /path/to/lib-cc7/.
I have sanity checked that the library is there about 5 times.
It seems like no matter what I do,
import MyModule.MySubModule as SubModule
always returns the same import error. Is there anything else that I haven't tried? Why does it seem like Python is looking in the wrong place for my .so?
EDIT 1:
This is what my setup.py (in essence) looks like:
Submodule1 = Extension('Module.Submodule1', sources=['Module/Submodule1/submod1.cpp'], language = C++, libraries=[..long list of libraries..],)
Submodule2 = Extension('Module.Submodule2', sources=['Module/Submodule2/submod2.cpp'], language = C++, libraries=[..long list of libraries..],)
setup(name = "Module", version = '1.0',
packages = ['Module', 'Module.Submodule1', 'Module.Submodule2'],
ext_modules = [Submodule1, Submodule2], )

Related

Why does calling my FORTRAN DLL in python require the python entry point to be in the DLL's parent directory?

I have a Fortran library compiled with gfortran into a DLL.
I am trying to call this DLL from python 3.9, and that can work, I can access the expected functions etc so this issue has nothing to do with ctypes etc.
Additionally, I know that the DLL works and can be called from python, another project uses the same DLL but has a flat folder structure (possibly to allow for this issue). However, I need it to be in a python package.
This main DLL has a few dependencies that need to be shipped with it. These dependencies must be in the parent directory of the main DLL (why I have no idea but that is the only way it works).
The issue occurs when trying to use this DLL in my python package.
If the entry point of the python code that calls the DLL is in the parent directory of the DLL then I can access the expected functions, if it is anywhere else I get the following error:
FileNotFoundError: Could not find module 'I:\<full-path>\wrappers\lib\foo.dll' (or one of its dependencies). Try using the full path with constructor syntax.
On the line:
self._libHandle = LoadLibrary(str(self._dll_path))
self._dll_path is a Path object with the absolute path to the DLL, I check the file exists before passing it to LoadLibrary.
I have the following directory structure (additional files removed for brevity):
src
|---entry.py
|---wrappers
| |---lib
| | |---foo.dll
| |---dep1.dll
| |---dep2.dll
| |---foo-wrapper.py
| |---adj-entry.py
If I add some test code to the bottom of foo-wrapper.py then I can access my DLL, if I import foo-wrapper into entry.py, I get the error above. Using the same code from entry.py in adj-entry.py works absolutely fine. The test code is shown below.
from src.wrappers import Foo
from pathlib import Path
dll_path = Path("../../src/wrappers/lib/foo.dll").resolve() # This path is the only thing adjusted between entry.py and adj-entry.py. Remove 1 ../ for entry.py
assert dll_path.exists()
assert dll_path.is_file()
f = Foo(dll_path)
What seems to be the only thing that changes is the file that python.exe is actually called with. When the file python is called with is in the DLL's parent directory everything works, if it is anywhere else I get the dependency error.
Does anyone know how I can call this DLL from anywhere?
Could this be related to the gfortran build or the Fortran code itself?

SWIG C++ to python with multiple extensions in module on Ubuntu (vs MacOS)

I am trying to run C++ code, wrapped with SWIG into Python 3.8 modules on a remote computing cluster (where I don't have root access). I was writing the code on my own computer (MacOS Monterey) while the cluster runs Ubuntu20.04. The Code makes use of libraries quadedge, nlopt and gsl. Using the latter two I perform an optimisation with nlopt and integration with gsl every step however this only works on my own Mac, on the Ubuntu cluster there's some undefined behaviour during runtime, i.e. my rate of change when integrating is zero at around 50% of the integration steps (On the Mac everything works flawlessly).
To install my code as python modules, I use a setup.py. The procedure and module structure I use is currently the following:
install quadedge
install polygeo (own module that depends on quadedge)
install plantdevelopment (own module that depends on polygeo)
Both polygeo and plantdevelopment contain two extensions in one module. My setup.py for polygeo on my Mac looks like this:
from setuptools import setup, find_packages, Extension
from setuptools.command.build_py import build_py as _build_py
import os
import glob
class build_py(_build_py):
def run(self):
self.run_command("build_ext")
return super().run()
polygeoDir = 'polygeo/'
surfnloptDir = 'polygeo/surfnlopt/'
includeDir = 'polygeo/include/'
srcDir = 'polygeo/src/'
headerFiles = glob.glob(includeDir + '*.hpp')
srcFiles = glob.glob(srcDir + '*.cpp')
srcFiles.append(polygeoDir + 'polygeo.i')
surfnloptHeaderFiles = glob.glob(surfnloptDir + 'include/*.hpp')
surfnloptSrcFiles = glob.glob(surfnloptDir + 'src/*.cpp')
surfnloptSrcFiles.append(surfnloptDir + 'surfnlopt.i')
allHeaders = []
allHeaders += headerFiles
allHeaders += surfnloptHeaderFiles
polygeoExt = Extension('_polygeo',
sources=srcFiles,
include_dirs=[includeDir],
library_dirs=[],
libraries=[],
swig_opts=['-c++'],
extra_compile_args=['-std=c++11'],)
surfnloptExt = Extension('surfnlopt._surfnlopt',
sources=surfnloptSrcFiles,
include_dirs=[includeDir,
surfnloptDir + '/include'],
library_dirs=[],
libraries=['m', 'nlopt'],
swig_opts=['-c++'],
extra_compile_args=['-std=c++11'],)
setup(name='polygeo',
version='1.0',
packages=find_packages(),
ext_package='polygeo',
ext_modules=[polygeoExt,
surfnloptExt],
install_requires=['quadedge'],
headers=allHeaders,
cmdclass = {'build_py' : build_py},
)
Interestingly, it seems not to be necessary on my Mac to add extra paths or anything, but on the remote it requires me to edit the .bash_profile in order to set the necessary includepaths etc..
On top of that on Ubuntu it is required though to also add to each of the extensions the extra_link_args=['/home/.local/lib/python3.8/site-packages/quadedge/_quadedge.cpython-38-x86_64-linux-gnu.so'],that point to the cpython.so that I already generated with the setup.py for quadedge.
Also I need to add the source files of the polygeo extension to the source files of surfnlopt, i.e. sources=srcFiles+surfnloptSrcFiles, otherwise it throws a "symbols not found" error when importing the modules in python, because surfnlopt depends on the source files of polygeo (same if I don't include the .so as it also needs functions from quadedge).
For installing plantdevelopment my script looks almost the same, however there I have to include the polygeo.so instead of quadedge.so and have to add swig_opts=['-c++','-I'+polygeoIncDir]that points to the directory where the header files from the polygeo setup end up.
First thing I was wondering if it causes problems to have the extensions depend on the previously installed ones as then plantdevelopment also depends on polygeo and its extensions.
Could adding the source files to the extensions again, have an impact due to ambiguity/duplication when running the code? Could this be a memory leak?
So far I tried the following:
try to have as little includes/sources as possible to avoid overdefinition in the setup.py
try to install python locally on the Ubuntu remote with hombrew and pyenv with all software the same version as on the Mac (as well as the compiler)
tried out python 3.10 as there apparently used to be a memory leak issue at some point
tried to install everything on Mac without root privileges (still works flawlessly)
tried to put ALL files into one single module (which doesn't work due to abstract classes and the C++ code structure)
tested that the C++ code without python is working properly to make sure it's not about the C libraries (unfortunately I need the python API for the project and it's not my choice)
Using the guppy memory profiler, actually some of the runs (~50%) work out properly, already when I just import it and print the heap status, however there seems to be a lot of randomness...
In my program I use guppy the following way:
heap = hpy()
print("Heap Status At Starting : ")
heap_status1 = heap.heap()
print("Heap Size : ", heap_status1.size, " bytes\n")
print(heap_status1)
heap.setref()
print("\nHeap Status After Setting Reference Point : ")
heap_status2 = heap.heap()
print("Heap Size : ", heap_status2.size, " bytes\n")
print(heap_status2)
I can't really think of a reason why just importing and printing the heap info should change anything of the outcome of my optimization/integration?
Could this be an issue with the different OS?
Or maybe the code structure it problematic with python, building everything on top of each other/having dependencies between extensions in a single module?
I'd be happy for any kind of help and will of course provide any further necessary info.

Compiling an optional cython extension only when possible in setup.py

I have a python module fully implemented in python. (For portability reasons.)
The implementation of a small part has been duplicated in a cython module. To improve perfomance where possible.
I know how to install the .c modules created by cython with distutils. However if a machine has no compiler installed, I suspect the setup will fail even though the module is still usable in pure python mode.
Is there a way to compile the .c module if possible but fail gracefully and install without it if compiling is not possible?
I guess you will have to make some modification both in your setup.py and in one __init__ file in your module.
Let say the name of your package will be "module" and you have a functionality, sub for which you have pure python code in the sub subfolder and the equivalent C code in c_sub subfolder.
For example in your setup.py :
import logging
from setuptools.extension import Extension
from setuptools.command.build_ext import build_ext
from distutils.errors import CCompilerError, DistutilsExecError, DistutilsPlatformError
logging.basicConfig()
log = logging.getLogger(__file__)
ext_errors = (CCompilerError, DistutilsExecError, DistutilsPlatformError, IOError, SystemExit)
setup_args = {'name': 'module', 'license': 'BSD', 'author': 'xxx',
'packages': ['module', 'module.sub', 'module.c_sub'],
'cmdclass': {'build_ext': build_ext}
}
ext_modules = [Extension("module.c_sub._sub", ["module/c_sub/_sub.c"])]
try:
# try building with c code :
setup(ext_modules=ext_modules, **setup_args)
except ext_errors as ex:
log.warn(ex)
log.warn("The C extension could not be compiled")
## Retry to install the module without C extensions :
# Remove any previously defined build_ext command class.
if 'build_ext' in setup_args['cmdclass']:
del setup_args['cmdclass']['build_ext']
# If this new 'setup' call don't fail, the module
# will be successfully installed, without the C extension :
setup(**setup_args)
log.info("Plain-Python installation succeeded.")
Now you will need to include something like this in your __init__.py file (or at any place relevant in your case):
try:
from .c_sub import *
except ImportError:
from .sub import *
In this way the C version will be used if it was build, other-wise the plain python version is used. It assumes that sub and c_sub will provide the same API.
You can find an example of setup file doing this way in the Shapely package. Actually most of the code I posted was copied (the construct_build_ext function) or adapted (lines after) from this file.
Class Extension has parameter optional in constructor:
optional - specifies that a build failure in the extension should not
abort the build process, but simply skip the extension.
Here is also a link to the quite interesting history of piece of code proposed by mgc.
The question How should I structure a Python package that contains Cython code
is related, there the question is how to fallback from Cython to the "already generated C code". You could use a similar strategy to select which of the .py or the .pyx code to install.

correct way to find scripts directory from setup.py in Python distutils?

I am distributing a package that has this structure:
mymodule:
mymodule/__init__.py
mymodule/code.py
scripts/script1.py
scripts/script2.py
The mymodule subdir of mymodule contains code, and the scripts subdir contains scripts that should be executable by the user.
When describing a package installation in setup.py, I use:
scripts=['myscripts/script1.py']
To specify where scripts should go. During installation they typically go in some platform/user specific bin directory. The code that I have in mymodule/mymodule needs to make calls to the scripts though. What is the correct way to then find the full path to these scripts? Ideally they should be on the user's path at that point, so if I want to call them out from the shell, I should be able to do:
os.system('script1.py args')
But I want to call the script by its absolute path, and not rely on the platform specific bin directory being on the PATH, as in:
# get the directory where the scripts reside in current installation
scripts_dir = get_scripts_dir()
script1_path = os.path.join(scripts_dir, "script1.py")
os.system("%s args" %(script1_path))
How can this be done? thanks.
EDIT removing the code outside of a script is not a practical solution for me. the reason is that I distribute jobs to a cluster system and the way I usually do it is like this: imagine you have a set of tasks you want to run on. I have a script that takes all tasks as input and then calls another script, which runs only on the given task. Something like:
main.py:
for task in tasks:
cmd = "python script.py %s" %(task)
execute_on_system(cmd)
so main.py needs to know where script.py is, because it needs to be a command executable by execute_on_system.
I think you should structure your code so that you don't need to call scripts from you code. Move code you need from scripts to your package and then you can call this code both from your scripts and from your code.
My use case for this was to check that the directory my scripts are installed into is in the user's path and give a warning if not (since it is often not in the path if installing with --user). Here is the solution I came up with:
from setuptools.command.easy_install import easy_install
class my_easy_install( easy_install ):
# Match the call signature of the easy_install version.
def write_script(self, script_name, contents, mode="t", *ignored):
# Run the normal version
easy_install.write_script(self, script_name, contents, mode, *ignored)
# Save the script install directory in the distribution object.
# This is the same thing that is returned by the setup function.
self.distribution.script_install_dir = self.script_dir
...
dist = setup(...,
cmdclass = {'build_ext': my_builder, # I also have one of these.
'easy_install': my_easy_install,
},
)
if dist.script_install_dir not in os.environ['PATH'].split(':'):
# Give a sensible warning message...
I should point out that this is for setuptools. If you use distutils, then the solution is similar, but slightly different:
from distutils.command.install_scripts import install_scripts
class my_install_scripts( install_scripts ): # For distutils
def run(self):
install_scripts.run(self)
self.distribution.script_install_dir = self.install_dir
dist = setup(...,
cmdclass = {'build_ext': my_builder,
'install_scripts': my_install_scripts,
},
)
I think the correct solution is
scripts=glob("myscripts/*.py"),

Making exe using py2exe + sqlalchemy + mssql

I have a problem with making exe using py2exe. In my project i'm using sqlalchemy with mssql module.
My setup.py script looks like:
from distutils.core import setup
import py2exe
setup(
windows=[{"script" : "pyrmsutil.py"}],
options={"pyrmsutil" : {
"includes": ["sqlalchemy.dialects.mssql", "sqlalchemy"],
"packages": ["sqlalchemy.databases.mssql", "sqlalchemy.cresultproxy"]
}})
But when i'm starting procedure like:
python.exe setup.py py2exe
I'm receiving build log with following errors:
The following modules appear to be missing
['_scproxy', 'pkg_resources', 'sqlalchemy.cprocessors', 'sqlalchemy.cresultproxy']
And in "dist" folder i see my pyrmsutil.exe file, but when i'm running it nothing happens. I mean that executable file starts, but do nothing and ends immediately without any pyrmsutil.exe.log. It's very strange.
Can anybody help me with this error?
I know it's no an answer per se but have you tries pyInstaller? I used to use py2exe and found it tricky to get something truly distributable. pyInstaller requires a little more setup but the docs are good and the result seems better.
For solving this issue you could try searching for the mentioned dlls and placing them in the folder with the exe, or where you build it.
Looks like py2exe can't find sqlalchemy c extensions.
Why not just include the egg in the distribution, put sqlachemy in py2exe's excludes and load the egg on start?
I use this in the start script:
import sys
import path
import pkg_resources
APP_HOME = path.path(sys.executable).parent
SUPPORT = APP_HOME / 'support'
eggs = [egg for egg in SUPPORT.files('*.egg')]
reqs, errs = pkg_resources.working_set.find_plugins(
pkg_resources.Environment(eggs)
)
map(pkg_resources.working_set.add, reqs)
sys.path.extend(SUPPORT.files('*.egg'))
i use Jason Orendorff's path module (http://pypi.python.org/pypi/path.py) but you can easily wipe it out if you want.

Categories