I write and maintain a Python library for quantum chemistry calculations called PyQuante. I have a fairly standard Python distribution with a setup.py file in the main directory, a subdirectory called "PyQuante" that holds all of the Python modules, and one called "Src" that contains source code for C extension modules.
I've been lucky enough to have some users donate code that uses Cython, which I hadn't used before, since I started PyQuante before either it or Pyrex existed. On my suggestion, they put the code into the Src subdirectory, since that's where all the C code went.
However, looking at the code that generates the extensions, I wonder whether I should have simply put the code in subdirectories of the Python branch instead. And thus my question is:
what are the best practices for the directory structure of python distributions with both Python and Cython source files?
Do you put the .pyx files in the same directory as the .py files?
Do you put them in in a subdirectory of the one that holds the .py files?
Do you put them in a child of the .py directory's parent?
Does the fact that I'm even asking this question betray my ignorance at distributing .pyx files? I'm sure there are many ways to make this work, and am mostly concerned with what has worked best for people.
Thanks for any help you can offer.
Putting the .pyx files in the same directory as .py files makes the most sense to me. It's what the authors of scikit-learn have done and what I've done in my py-earth module. I guess I think of Cython modules as optimized replacements for Python modules. I will often begin by writing a package in pure Python, then replace some modules with Cython if I need better performance. Since I'm treating Cython modules as replacements for Python modules, it makes sense to me to keep them in the same place. It also works well for test builds using the --inplace argument.
Related
I have a .py file that imports from other python modules that import from config files, other modules, etc.
I am to move the code needed to run that .py file, but only whatever the py file is reading from (I am not talking about packages installed by pip install, it's more about other python files in the project directory, mostly classes, functions and ini files).
Is there a way to find out only the external files used by that particular python script? Is it something that can be found using PyCharm for example?
Thanks!
Static analysis tools (such as PyCharm's refactoring tools) can (mostly) figure out the module import tree for a program (unless you do dynamic imports using e.g. importlib.import_module()).
However, it's not quite possible to statically definitively know what other files are required for your program to function. You could use Python's audit events (or strace/ptrace or similar OS-level functions) to look at what files are being opened by your program (e.g. during your tests being run (you do have tests, right?), or during regular program use), but it's likely not going to be exhaustive.
I am currently creating a python library for my python projects, and I require certain things to run much faster than normal python can, and Cython is the only way I can think to do this.
I have created a setup.py file and have tried multiple methods of achieving the cython build:
I have used
from distutils.core import setup
from Cython.Build import cythonize
# Note filePath is the directory to the .pyx file, not a variable
setup(ext_modules=cythonize(filePath))
Running python setup.py install builds the extension, and then installs it, however, it also generates many extra folders and files from previous projects where I have used cython. I only expected the file I had given it to be created into an extension module.
I have tried different methods for creating the extension files, however, none of them do anything different and they all give the same results: loads of folders and files being created in my project that I didn't ask for.
Any help as to how I should solve this problem would be greatly appreciated.
Thank you
Fixed this issue,
It turns out that Cython treats files with the same name as the same projects, so simply changing the name of my project was enough to fix it. This is not intuitive, though in a way it makes sense.
I hope this helps anyone who comes across this problem.
I am on Python 2.7 and new to Cython.
Background:
I have 20+ py files in my project and then I found the slowness are coming from 3 of them.
So I use Cython for those files, they are now compiled with Cython and become pyd files without any issue. (I spent days to investigate the problem, look for the best solution, improve the way coding in Python but I still have to use Cython for performance reason)
Except the pyd file, under the build folder, there are a few more files with the same filename but different extension, namely ".c", ".exp", ".lib", ".obj" and ".pyd.manifest"
It seems like the project is still working and the performance remains on Cython level even I moved those files away (".c" ".exp", ".lib", ".obj" and ".pyd.manifest")
I am confused with those output files from the compiler, not sure what's necessary and what's not, and how shall I use them and treat them.
My setup.py:
from distutils.core import setup
from Cython.Build import cythonize
setup(
ext_modules=cythonize("myCythonFile.pyx",)
)
All of these files are temporary files.
Cython compiles each of your pyx files (you only have one) to C code in matching .c files. It can also emit other files, like an HTML file to make the C code more readable, but by default, this is all it gives you, and you didn't ask for anything extra.
Cython then asks whatever C compiler you have configured via distutils—in your case, that's MSVC (Microsoft Visual C++, the C and C++ compiler that comes with Visual Studio)—and to build a .dll/.pyd file out of those .c files. The full details of what files that creates and what they mean depend on your compiler version, but basically it creates a .obj file for each .c file, then a .lib import library and .exp export library to go with your .dll, and a .manifest file that allows loading the library as an assembly.
Some of these files—in particular the .c and .obj files—are very handy for debugging if something goes wrong in the compiled code. (Cython-generated C code can be pretty ugly to trace through, but raw machine code can be even worse.)
All of these files can help make rebuilds after minor changes faster.
Some of these files are also needed if you want to do more complicated things like linking other libraries against your library.
If you're not doing any of those things, you don't need them. But there's also really no reason to get rid of them. (If you want to redistribute your code, you're probably going to build a source package, and a binary wheel, and both of those know how to skip over unnecessary intermediate files.)
Consider that I have a package called "A" consisting of several modules and also nested packages. Now, I want to distribute this package to user and I do not want user to see my code at all. I heard that ".pyc" can be de-compiled. So, I am just wondering what could be the other alternatives for this problem.
It would be great if someone gives some ideas in this regard.
You actually have few options. First, you can compile your code into pyc files. However, this can be circumvented with the disassembler library dis, but this requires a lot of technical know-how. You can also use py2exe to package it as an exe file; this converts the pyc file into an exe file. This can still be disassembled but adds an extra layer. You also have a few encryption solutions; for example you can use pyconcrete to encrypt your imports until they are loaded into memory. You can also just encryption the entire application, then ship the decrypter and launcher with it as a C/C++ application (or any other compiled language). Lastly, if you are comfortable with getting python to run custom C/C++ code, you can also put your private code into a DLL or SO and call it directly for the script.
Python is an interpreted language. That means that if you want to distribute pyc files you'll have to have them run on the same OS/architecture as yours or you'll run into subtle problems. That, and the fact that most code can be decompiled to some degree, would urge me to rethink your use case.
Can you rethink your package as a service instead?
We've got a (Windows) application, with which we distribute an entire Python installation (including several 3rd-party modules that we use), so we have consistency and so we don't need to install everything separately. This works pretty well, but the application is pretty huge.
Obviously, we don't use everything available in the runtime. I'd like to trim down the runtime to only include what we really need.
I plan on trying out py2exe, but I'd like to try and find another solution that will just help me remove the unneeded parts of the Python runtime.
One trick I've learned while trimming down .py files to ship: Delete all the .pyc files in the standard library, then run your application throughly (that is, enough to be sure all the Python modules it needs will be loaded). If you examine the standard library directories, there will be .pyc files for all the modules that were actually used. .py files without .pyc are ones that you don't need.
Both py2exe and pyinstaller (NOTE: for the latter use the SVN version, the released one is VERY long in the tooth;-) do their "trimming" via modulefinder, the standard library module for finding all modules used by a given Python script; you can of course use the latter yourself to identify all needed modules, if you don't trust pyinstaller or py2exe to do it properly and automatically on your behalf.
This py2exe page on compression suggests using UPX to compress any DLLs or .pyd files (which are actually just DLLs, still). Obviously this doesn't help in trimming out unneeded modules, but it can/will trim down the size of your distribution, if that's a large concern.