Running Cython install generates unwanted files - python

I am currently creating a python library for my python projects, and I require certain things to run much faster than normal python can, and Cython is the only way I can think to do this.
I have created a setup.py file and have tried multiple methods of achieving the cython build:
I have used
from distutils.core import setup
from Cython.Build import cythonize
# Note filePath is the directory to the .pyx file, not a variable
setup(ext_modules=cythonize(filePath))
Running python setup.py install builds the extension, and then installs it, however, it also generates many extra folders and files from previous projects where I have used cython. I only expected the file I had given it to be created into an extension module.
I have tried different methods for creating the extension files, however, none of them do anything different and they all give the same results: loads of folders and files being created in my project that I didn't ask for.
Any help as to how I should solve this problem would be greatly appreciated.
Thank you

Fixed this issue,
It turns out that Cython treats files with the same name as the same projects, so simply changing the name of my project was enough to fix it. This is not intuitive, though in a way it makes sense.
I hope this helps anyone who comes across this problem.

Related

Is there a way to import a .pyd extension file as a simple include in a python file?

Hi there wise people of stack. I'm having trouble with importing pyd files as python objects.
The story:
I have an internal repo on gitlab that runs python files as well as C++ files. the repo uses pybind for both languages to speak to one another. The whole project is built with CI/CD and the artefact I have access to are .pyd extension files.
What I was given as a task would be to access some .pyd files (in different folders) with a single python script and access their classes (encoded inside this .pyd file) to mock them using python.
The problem:
What I was told was that I would need a simple include to be able to access the .pyd as an object through python just like you would with a library. However, I came across errors in the whole process. I have gone through this post and this one, but it seems that none of them works for me.
What was tried:
The first thing I did was start a remote folder with a single .pyd file from the project(let's call it SomeClass.pyd). I then created a python file test.py in the same directory as the pyd file.
The whole architecture looks like the following:
|--folder
|--SomeClass.pyd
|--test.py
Then, in the test.py file, I tried running
import SomeClass.pyd
import SomeClass
import SomeClass.pyd as sc
from SomeClass.pyd import *
from SomeClass import *
which all yielded the same following error:
ImportError: dynamic module does not define module export function
Now, I know that pyd files are similar to dlls, but I was told multiple time that a simple import would let me access the object information without needing anything in particular.
I recall reading about adding the PYTHONPATH before launching the whole process. However, I need that file to access the pyd without having to add any variable to the path as I will likely not always have access rights to the PYTHONPATH.
The project is quite big, so I'm trying to keep it bare minimum, but if you need more info, I'll try to give some more.
Thank you for your feedback!
Alright, after some time and a lot of researching, I found the weird answer for the problem that occured. I really hope it will help anyone encountering the same issue.
The problem was caused by the fact that pycharm has sometimes issues with the whole dynamic import.
First problem: dynamic import
This was solved simply by going on pycharm --> files --> invalidate cache and then tick "Clear file system cache and Local History" as well as "Clear VCS Log caches and indexes". You should then be prompted to reboot.
I also add a note that even after fixing the issue, sometime, for no apparent reason, I still have to invalidate cache again.
Second problem: venv
Once rebooted, you might be able to import manually the path to your pyd file, but you probably won't be able to auto complete. What solved this for me was compiling manually the code responsible for the pyd in order to generate a wheel. In my case, I used poetry:
poetry build
Once the wheel was created, I did a manual pip install of the wheel created by the poetry build to install it directly into the venv:
pip install dist/the_name_of_your_wheel_file.whl
These steps were the ones to fix my problem. I hope this will help anyone encountering the same problem!

What are the multiple output files from Cython for?

I am on Python 2.7 and new to Cython.
Background:
I have 20+ py files in my project and then I found the slowness are coming from 3 of them.
So I use Cython for those files, they are now compiled with Cython and become pyd files without any issue. (I spent days to investigate the problem, look for the best solution, improve the way coding in Python but I still have to use Cython for performance reason)
Except the pyd file, under the build folder, there are a few more files with the same filename but different extension, namely ".c", ".exp", ".lib", ".obj" and ".pyd.manifest"
It seems like the project is still working and the performance remains on Cython level even I moved those files away (".c" ".exp", ".lib", ".obj" and ".pyd.manifest")
I am confused with those output files from the compiler, not sure what's necessary and what's not, and how shall I use them and treat them.
My setup.py:
from distutils.core import setup
from Cython.Build import cythonize
setup(
ext_modules=cythonize("myCythonFile.pyx",)
)
All of these files are temporary files.
Cython compiles each of your pyx files (you only have one) to C code in matching .c files. It can also emit other files, like an HTML file to make the C code more readable, but by default, this is all it gives you, and you didn't ask for anything extra.
Cython then asks whatever C compiler you have configured via distutils—in your case, that's MSVC (Microsoft Visual C++, the C and C++ compiler that comes with Visual Studio)—and to build a .dll/.pyd file out of those .c files. The full details of what files that creates and what they mean depend on your compiler version, but basically it creates a .obj file for each .c file, then a .lib import library and .exp export library to go with your .dll, and a .manifest file that allows loading the library as an assembly.
Some of these files—in particular the .c and .obj files—are very handy for debugging if something goes wrong in the compiled code. (Cython-generated C code can be pretty ugly to trace through, but raw machine code can be even worse.)
All of these files can help make rebuilds after minor changes faster.
Some of these files are also needed if you want to do more complicated things like linking other libraries against your library.
If you're not doing any of those things, you don't need them. But there's also really no reason to get rid of them. (If you want to redistribute your code, you're probably going to build a source package, and a binary wheel, and both of those know how to skip over unnecessary intermediate files.)

How to install python binding of a C++ library

Imaging that we are given a finished C++ source code of a library, called MyAwesomeLib. The goal is to expose some of its power to python, so we create a wrapper using swig and generated a python package called PyMyAwesomeLib.
The directory structure now looks like
root_dir
|-src/
|-lib/
| |- libMyAwesomeLib.so
| |- _PyMyAwesomeLib.so
|-swig/
| |- PyMyAwesomeLib.py
|-python/
|- Script_using_myawesomelib.py
So far so good. Ideally, all we want to do next is to copy lib/*.so swig/*.py and python/*.py into the corresponding directory in site-packages in a pythonic way, i.e. using
python setup.py install
However, I got very confused when trying to achieve this simple goal using setuptools and distutils. Both tools handles the compilation of python extensions through an internal system, where the source file, compiler flags etc. are passed using setup(ext_module=[Extension(...)]). But this is ridiculous since MyAsesomeLib has a fully functioning build system that is based on makefile. Porting the logic embedded in makefiles would be redundant and completely un-necessary work.
After some research, it seems there are two options left, I can either override setuptools.command.build and setuptools.command.install to use the existing makefile and copy the results directly, or I can somehow let setuptools know about these files and ask it to copy them during installation. The second way is more appealing, but it is what gives me the most headache. I have tried the following optionts without success
package_data, and include_package_data does not work because *.so files are not under version control and they are not inside of any package.
data_files does not seems to work since the files only get included when running python setup.py sdist, but ignored when python setup.py install. This is the opposite of what I want. The .so files should not be included in the source distribution, but get copied during the installation step.
MANIFEST.in failed for the same reason as data_files.
eager_resources does not work either, but honestly I do not know the difference between eager_resources and data_files or MANIFEST.in.
I think this is actually a common situation, and I hope there is a simple solution to it. Any help would be greatly appreciated.
Porting the logic embedded in makefiles would be redundant and
completely un-necessary work.
Unfortunately, that's exactly what I had to do. I've been struggling with this same issue for a while now.
Porting it over actually wasn't too bad. distutils does understand SWIG extensions, but it this was implemented rather haphazardly on their part. Running SWIG creates Python files, and the current build order assumes that all Python files have been accounted for before running build_ext. That one wasn't too hard to fix, but it's annoying that they would claim to support SWIG without mentioning this. Distutils attempts to be cross-platform when compiling things, so there is still an advantage to using it.
If you don't want to port your entire build system over, use the system's package manager. Many complex libraries do this (but they also try their best with setup.py). For example, to get numpy and lxml on Ubuntu you'd just do:
sudo apt-get install python-numpy python-lxml. No pip.
I realize you'd rather write one setup file instead of dealing with every package manager ever so this is probably not very helpful.
If you do try to go the setuptools route there is one fatal flaw I ran into: dependencies.
For instance, if you are distributing a SWIG-based project, it's going to need libpython. If they don't have it, an error like this happens:
#include <Python.h>
error: File not found
That's pretty unhelpful to the average user.
Even worse, if you require a shared library but the user's library is out of date, the user can get some crazy errors. You're at the mercy of their C++ compiler to output Google-friendly error messages so they can figure it out.
The long-term solution would be to get setuptools/distutils to get better at detecting non-python libraries, hopefully as good as Ruby's gem. I pretty much had to roll my own. For instance, in this setup.py I'm working on you can see a few functions at the top I hacked together for dependency detection (still doesn't work on all systems...definitely not Windows).

distutils setup script under linux - permission issue

So I created a setup.py script for my python program with distutils and I think it behaves a bit strange. First off it installs all data_files into /usr/local/my_directory by default which is a bit weird since this isn't a really common place to store data, is it?
I changed the path to /usr/share/my_directory/. But now I'm not able to write to the database inside that directory and I can't set required permission from within setup.py neither since the actual database file has not been created when I run it.
Is my approach wrong? Should I use another tool for distributing?
Because at least for Linux, writing a simple setup sh script seems easier to me at the moment.
The immediate solution is to invoke setup.py with --prefix=/the/path/you/want.
A better approach would be to include the data as package_data. This way they will be installed along side your python package and you'll find it much easier to manage it (find paths etc).

Where do I put my cython files in a python distribution?

I write and maintain a Python library for quantum chemistry calculations called PyQuante. I have a fairly standard Python distribution with a setup.py file in the main directory, a subdirectory called "PyQuante" that holds all of the Python modules, and one called "Src" that contains source code for C extension modules.
I've been lucky enough to have some users donate code that uses Cython, which I hadn't used before, since I started PyQuante before either it or Pyrex existed. On my suggestion, they put the code into the Src subdirectory, since that's where all the C code went.
However, looking at the code that generates the extensions, I wonder whether I should have simply put the code in subdirectories of the Python branch instead. And thus my question is:
what are the best practices for the directory structure of python distributions with both Python and Cython source files?
Do you put the .pyx files in the same directory as the .py files?
Do you put them in in a subdirectory of the one that holds the .py files?
Do you put them in a child of the .py directory's parent?
Does the fact that I'm even asking this question betray my ignorance at distributing .pyx files? I'm sure there are many ways to make this work, and am mostly concerned with what has worked best for people.
Thanks for any help you can offer.
Putting the .pyx files in the same directory as .py files makes the most sense to me. It's what the authors of scikit-learn have done and what I've done in my py-earth module. I guess I think of Cython modules as optimized replacements for Python modules. I will often begin by writing a package in pure Python, then replace some modules with Cython if I need better performance. Since I'm treating Cython modules as replacements for Python modules, it makes sense to me to keep them in the same place. It also works well for test builds using the --inplace argument.

Categories