I have a question regarding building distribution files using CX Freeze.
I have built several distribution packages for different set of codes and application that I have written in Python.
I usually use Cx_Freeze to make my build and distribution packages.
One of the key target most of the time will be the size of the package before and after installation.
Although Cx_freeze picks up the necessary module, most of the time you end up adding certain libraries such matplotlib backends, numpy library etc when you use them as a part of your code.
The Key trick to reduce the size could be excluding the modules that you don't need as part of your code.
Most of the time for me it will be trial and error.
But how one could decide the most optimized build with stripping all non essential modules during the build?
Say for example if my application is not GUI based, I end up removing tkinter, but once matplotlib backend was using and I had to bring it back again.
Is it always iterative process?
Related
When installing matplotlib there is a significant cost to first time use when running an image in single-threaded mode described by the following message:
Matplotlib is building the font cache; this may take a moment.
Note that this first time cost applies every time when running from a docker image. This makes it an exponential cost in a business application, and it must be fixed. This means that a matplotlib dependency in an image requires an elaboration on the installation concept, discussed next.
Optionally, in the docker file, I can trigger the cache build by importing the library:
RUN python -c "import matplotlib; # etc."
As part of the matplotlib install, but IDEALLY an install is an install, and doesn't bleed into other components like glue or gum (very applied-science/handy-crafty rather than engineered).
I can see how "normally" this is not a problem, and re-paradigm-ing is a way to get things done. But we call installation "installation" for a reason and violating the boundaries of plumbing concepts in plumbing applications is ultimately going to be problematic.
How does one build the font cache upon installation so that my docker image is more performant upon startup?
I have a python script that I use to analyze data. I rely on number crunching packages like numpy and others to work with my data. However, the packages constantly evolve and some functions depreciate, etc. This forces me to go through the script several times per year to fix errors and make it work again.
One of the solutions is to keep an older version of numpy. However, there are other packages that require a new version of numpy.
So the question I have is: Is there a way to 1) keep multiple versions of a package installed or 2) have a local copy the package located in the directory of my script so I am in control what I am importing. For example, I can have my own package where I will have all the different packages and versions I need.
Later, I can simply import libraries I want
from my_package.numpy_1_15 as np115
from my_package.numpy_1_16_4 as np1164
and later in my code, I can decide which function to use from which numpy version. For example:
index = np115.argwhere(x == 0)
This is my vision of the solution to my problem where I want to keep using old functions from previous versions of numpy (or other libraries). In addition, in this way, I can always have all the libraries needed with me in my script directory. So, if I need to run the script on a different machine I don't need to spend hours figuring out if everything is compatible.
Here are possible proposed solutions and why they do not solve my problem.
Virtual Environments in Python or Anaconda.
There are a bunch of introductions (for example) available that explain how to use them. However, virtual environments require maintenance and initial setup. Imagine, if I can just have a python code that performs well a specific computational task independent on what year it is and what packages are installed on any machine. The code can be shared among different research groups and will always work.
python create standalone executable linux
I can create standalone executable (example). However, it will be compiled​ and cannot be dynamically changed the really nice feature of Python
I've read the Python documentation chapter explaining how to embed the Python interpreter in a C/C++ application. Also, I've read that you can install Python modules either in a system-wide fashion, or locally to a given user.
But let's suppose my C/C++ application will use some Python modules such as SymPy, Matplotlib, and other related modules. And let's suppose end users of my application won't have any kind of Python installation in their machines.
This means that my application needs to ship with "pseudo-installed" modules, inside its data directories (just like the application has a folder for icons and other resources, it will need to have a directory for Python modules).
Another requirement is that the absolute path of my application installation isn't fixed: the user can "drag" the application bundle to another directory and it will run fine there (it already works this way but that's prior to embedding Python in it, and I wish it continues being this way after embedding Python).
I guess my question could be expressed more concisely as "how can I use Python without installing Python, neither system-wide, nor user-wide?"
There are various ways you could attempt to do this, but none of them are general solutions. From the (docs):
5.5. Embedding Python in C++
It is also possible to embed Python in a C++ program; precisely how this is done will depend on the details of the C++ system used; in general you will need to write the main program in C++, and use the C++ compiler to compile and link your program. There is no need to recompile Python itself using C++.
This is the shortest section in the document, and is roughly equivalent to: 'left as an exercise for the reader`. I do not believe you will find any straight forward solutions.
Use pyinstaller to gather the pieces:
This means that my application needs to ship with "pseudo-installed" modules, inside its data directories (just like the application has a folder for icons and other resources, it will need to have a directory for Python modules).
If I needed to tackle this problem, I would use pyinstaller as a base. (Disclosure: I am an occasional contributer). One of the major functions of pyinstaller is to gather up all of the needed resources for a python program. In onedir mode, all of the things needed to let the program run are gathered into one directory.
You could include this tool into your make system, and have it place all of the needed pieces into your python data directory in your build tree.
I know the difference between static and dynamic linking in C or C++. But what does it mean in Python? Since it's just an interpreter, and only having one style of import mechanism of modules, how does this make sense?
If I freeze my Python application with cx_freeze by excluding a specific library, is it a kind of dynamic linking? Because, users have to download and install that library by themselves in order to run my application.
Actually my problem is, I'm using the PySide library (with LGPL v2.1) to develop a Python GUI application. The library says I should dynamically link to the library to obey their legal terms (same as Qt). In this case, how do I link to PySide dynamically?
In python there's no static linking. All the imports requires the correct dependencies to be installed on our target machine. The choice of the version of such libraries are in our decision.
Now let's come to the binary builders for python. In this case, we'll have to determine the linking type based on the GNU definitions. If the user can replace the dependency as he likes, it's dynamic. If the dependency is attached together with the binary itself, it's static linking. In case of cx_freeze or pyinstaller, if we build this as one file, it's static linking. If we build this in normal mode where all the dependencies are collected as separate files, it's dynamic linking. Idea is, whether we can replace the dependency in target machine or not.
I plan to use PyInstaller to create a stand-alone python executable. PythonInstaller comes with built-in support for UPX and uses it to compress the executable but they are still really huge (about 2,7 mb).
Is there any way to create even smaller Python executables? For example using a shrinked python.dll or something similiar?
If you recompile pythonxy.dll, you can omit modules that you don't need. Going by size, stripping off the unicode database and the CJK codes creates the largest code reduction. This, of course, assumes that you don't need these. Remove the modules from the pythoncore project, and also remove them from PC/config.c
Using a earlier Python version will also decrease the size considerably if your really needing a small file size. I don't recommend using a very old version, Python2.3 would be the best option. I got my Python executable size to 700KB's! Also I prefer Py2Exe over Pyinstaller.
You can't go too low in size, because you obviously need to bundle the Python interpreter in, and only that takes a considerable amount of space.
I had the same concerns once, and there are two approaches:
Install Python on the computers you want to run on and only distribute the scripts
Install Python in the internal network on some shared drive, and rig the users' PATH to recognize where Python is located. With some installation script / program trickery, users can be completely oblivious to this, and you'll get to distribute minimal applications.