Why is the Python build of OpenCV (cv2.pyd) so small?

Why is the Python build of OpenCV (cv2.pyd) so small? - python

I wanted to use OpenCV with Python, so I downloaded OpenCV for Windows and got a folder of ~3.7GB after decompression. What surprised me was that the only file I needed was cv2.pyd, which was so small (~11MB) comparing to the C builds (~674MB). I simply copied it to my Python lib-packages folder without adding anything to my PATH and it worked perfectly.
I don't know how Python binding works, and I thought it should call C/C++ implementations under the hood. However, cv2 did not seem to require any C/C++ library. It just looks like magic to me.

Most likely it has something to do with static linking and using all possible tricks found in "Reducing Executable Size" or "GCC x86 code size optimizations"
OpenCV uses cmake as build system, which provides "MinSizeRel" build type. It seems to auto-apply most of those tricks. Couldn't find any good documentation on that, hence: [citation needed]
(follows my original answer which didn't quite address the actual question)
More convenient way to get opencv for python may be to download it from: http://www.lfd.uci.edu/~gohlke/pythonlibs/#opencv
After running installer you'll find cv2.pyd in c:\python27\lib\site-packages
As far as we are concerned .pyd file is same as .dll: http://docs.python.org/2/faq/windows.html#is-a-pyd-file-the-same-as-a-dll
Which means that we can use Dependency Walker to look into it. This is what we see:
This picture means that cv2.pyd is dynamically linked against opencv libraries which contain actual functionality. These take around ~45MB of disk space.

Related

Run Python Library as script

Some important background upfront, I am using a computer that does not give me access to pip. In fact I do not have access to the command prompt. This make is it impossible for me to install additional libraries unfortunately (at least the standard way).
My question is whether I can run a python library without formally installing it. Could I download the library, and then store it the same directory as my main script, and then import it like I would with a multi .py script project with functions being defined in other files, almost as if I had written the script natively on my computer?
Specifically, I would like to use pdfminer.six. Apparently it is written completely in python, however, I realize that may not mean what I think it does. It may be similar to numpy which I understand has C++ code associated with it.

You can import any script or lib from your current folder (example). You can find any lib you want by googling 'lib_name github'. Download the zip and unpack it in your folder, it should work.
You can also go to your python Lib folder on another computer and copy libs from there (By default: C:\Users\User\AppData\Local\Programs\Python\Python310\Lib)

Maybe you can use a web-based-interpreter solution like Google Colab and work in your browser.
https://colab.research.google.com

Specify a path for ctypes.util.find_library() under macOS

I'd like to specify a path for ctypes.util.find_library() to search. How can I do this?
I'd like to do it from within Python.
I'm using macOS.
If I wanted to do it from outside Python, I can specify the location using LD_LIBRARY_PATH. However I have read that I cannot modify this environment variable from within Python as it is cached on Python's startup. Modifying the value and then restarting Python seems like a very unusable idea; for example, what would happen if the library was imported part way through execution?
Why would I like to do this? Because I would like to add a MacOS wheel to a Python library that works under Windows. Currently they're packaging the DLLs into the same directory as the Python source files and adding that path to Windows' PATH environment, which ctypes.util.find_library() searches--a technique that I can't seem to replicate under Mac.
I have tried to understand delocate. It seems to state that the Python library doesn't depend on any shared objects. I suspect this is because the dylibs are loaded dynamically using ctypes.util.find_library() rather than being compiled code within Python.
Any suggestions would be gratefully received!

Although there are environment variables (LD_LIBRARY_PATH and DYLD_LIBRARY_PATH) that impact the search path for shared libraries, they are read and fixed by Python when Python starts up. Thus that option was eliminated.
Instead, we took the approach hinted at in the Python documentation:
If wrapping a shared library with ctypes, it may be better to determine the shared library name at development time, and hardcode that into the wrapper module instead of using find_library() to locate the library at runtime.
We hard-coded the names of the libraries that were included for each operating system, plus we included a generic name for operating systems where the libraries were not included:
names = {
"Windows": "libogg.dll",
"Darwin": "libogg.0.dylib",
"external": "ogg"
}
The library loading code checks if it can find the specified library for the current operating system. If it isn't successful, a search is made for the 'external' library.
The library loading code can be found inside the PyOgg project; see Library.load().
Further to this, inspired by the delocate project, we were required to edit the paths found inside certain dylibs. For example, we edited the opusfile binary to point to the ogg binary found in our package:
install_name_tool -change /usr/local/opt/libogg/lib/libogg.0.dylib #loader_path/libogg.0.dylib libopusfile.0.dylib
For the details on this process please see the README file associated with the macOS binaries inside the PyOgg project.
Further discussion of the approach can be found in the issue raised on PyOgg's GitHub page (#32).

How to efficiently browse OpenCV repository to understand the code?

I am relatively new to coding and I apologize if my questions are straightforward to you.
I am trying to understand OpenCV code to be able to add my contributions (mainly converting 2D tools to 3D as it would be useful for my machine learning projects and for medical projects). There is also some extra-curiosity since I like to understand how things work.
1) On the example of the GaussianBlur method. What happens when I call it in Python? Namely, how the Python code is bind to the C++ one? When I browse the repository, there are all C++ files, and I do not find where it is done. When I installed cv2 with pip all was automatic, but I would like to understand the process.
2) if I want to understand the whole GaussianBlur algorithm, I am also not familiar with C++ browsing, so how should I proceed to retrieve what files are used (methods and also inherited classes).
I've found on another answer that https://github.com/opencv/opencv/blob/9c23f2f1a682faa9f0b2c2223a857c7d93ba65a6/modules/imgproc/src/smooth.cpp#L4085 contains the method, but how can I find any method on my own? Why isn't it in the master folder but in the blob folder? How to find then the other methods or classes called by this one?
3) this is more a curiosity question since I am not familiar with makefiles, but when is done the binding between Python and C++? When I install OpenCV with pip it is done automatically, but I would like to understand the process.
Thanks a lot for your answers! I would appreciate any tutorial since I've googled a lot before asking, of course, but did not find what could help me on my own.

In C++ you have to download the library and link them in the compilation and linking process (when creating an executable from source code).
The C++ bind is done with python.h library for c++. using this binding OpenCV module is created for python.
For learning gaussian blur, etc. you can learn image processing.
The methods of Opencv are kept in their respective files. like opencv2/highgui.hpp for OpenCV GUI like imshow. you can import them to C++ with #include <opencv2/highgui.hpp>(the methods are separated to different files to reduce importing unnecessary methods).
CMake is like a scripting language(its a tool) where you can write a script on how the tool should build the executable from source code.
The starter tutorial is Here

Building python 3.6.6 from source on Win10

I've downloaded the python 3.6.6 source from here...
https://www.python.org/downloads/release/python-366/
...and followed the instruction on how to build on Windows (run ../PCbuild/build.bat). Python compiles and seems to be working (funny and scary: while fetching externals, it actually downloads python-3.7.0 as a dependency... :/ ). However, it looks like the build is somehow 'in place', and the binaries end up in some sub-folder of the source (../PCbuild/amd64/python.exe). This means I'm left with source and compiled code mixed up instead of some clean/lean and deployable package.
can I somehow provide '--prefix=/target/build/path' to define a target location to build to, like I would on linux?
is there a way of removing all src files/folders and leave only the required files/folders (../lib, ../include, etc...).
Or in general, is there a way of making the build process more behave like on linux?
Thanks for your help,
Max

The build.bat from PCBuild is intended for developers, that is, for testing purposes. What you want is under \Tools\msi\buildrelease.bat. This creates a subdirectory under \PCBuild\ that places all msi, cab and exe files ready for later installation. According to the readme, there doesn't seem to be an option to pack all those files in a single .exe file, like all installers eventually do, but another option is under \Tools\msi\build.bat which does have an option for packing (namely build.bat --pack). "But", the readme does state that the buildrelease.bat should be used for an official release. The advantage of doing so is that Pyhton would be optimized using PGO to your own hardware. I am also trying to compile from source using this method but I am having an issue with a recurring error (and other ones):
PGO run did not succeed (no python36!*.pgc files) and there is no data to merge [E:\RepoGiT\3.6\PCbuild\pythoncore.vcxproj]
so, if you do go this route, and find this, or other errors, please send the bug report to python's bug tracker webpage. And better yet, if you find errors and their solution, please report back here!

How to remove the dependence of Python extensions on the UCRT

I am using cython to generate *.c files, to be later compiled with the MS Visual Studio 2017 as C/C++. It all works splendid, with the minor exception that all python *.lib were dynamically linked.
Since my goal is to produce a self-contained exe (large exe size is not a problem), I would like to ask if it is possible to static-link all the Python *.lib. I already tried specifying the \MT release option and defining all Python libraries on the Debugger include.
Unfortunately, all my efforts were futile, since the dynamically linked executable can't find the python3.dll when copied to another computer. Currently I plan to copy the Entire python install directory together with the executable and specify the proper include links when compiling.
Therefore, I am interested in any option, it it exists, to produce a self-contained portable executable.
I would appreciate your help and advice.

xaav is correct.
I cannot comment so Instead I will post this as a solution in the hopes it will direct you to right path.
Cython exists for a reason. You get your python code, add a few changes and bam, your code is cythonised.
This is good for two reasons. To obfuscate the code and it can speed up the code (depends).
Why do you not use cython and pyinstaller? This is tried and tested. Pyinstaller even says that it supports it. The approach you are taking can be done in theory but it is so overly complicated and not even needed.
Possible concerns:
But can't they steal my source code? No, it's cythonised so yes but not easily.
Can't I use Nuitka? Yes, if you want it to be buggy and not work as intended.
What about the libraries, they do not work on another pc? Spec files exist for a reason. A bit of manual handling and this can work.
Can't I compile to c++ and then make it standalone? Take a look at the number of unanswered questions and people who could not get it to work. Also, it is not needed when pyinstaller and cython exist and does the same thing. Cython is widely supported. It just feels like you are doing things the long and hard way.
But won't compiling to c++ be easier. No way, pyinstaller already does most of the leg work. You might have to adjust the spec file here and there, but otherwise it's the only way to go. Keep in mind it also has integration with pyupdater too.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.