How do developers manage application size with external packages?

How do developers manage application size with external packages? - python

In my ad-hoc education, i think i might have missed this concept. Say i wanted to distribute this app:
from very_large_external_package.large_module import HelloWorld
if __name__ == '__main__':
HelloWorld().run()
The .py file is very small and the compiled .pyc version is even smaller; however, if i wanted to distribute my application, say as an executable, i would need to include very_large_external_package, or large_module at the very least.
I've looked at Snakefood as being a possible solution for my real-world application. It returned many dependencies of dependencies--meaning that i would need to go through each of them and start chopping otherwise very good code, which then gave me the feeling that i was going about this the wrong way.
How do python developers manage application size with imports?
Thanks in advance.

Options:
Don't worry about it. Simply require that very_large_external_package be installed to use your application. This is very common. (SciPy, NumPy, Matplotlib, etc. also PyQt, are not small, and very often requied. I'm sure others could provide many more examples of this.)
Don't worry about it. Use something like PyInstaller and simply deal with the large dependency-free binaries it generates.
It just really isn't reasonable (or maintainable) to go hacking out parts of 3rd party libraries to use in your application. (Unless you're talking about a very small, isolated piece, which is probably not the case.)

Related

Why do we need qresource files?

Why does Qt seem to need "resource files"? I mean the things you compile with pyside2-rcc resources.qrc -o resources.py, for example.
The documentation seems to say that it has something to do with accessing paths, abstracted in such a way that Python doesn't trip over the differences in a cross-platform application. But IME, Python does fine with Linux/Unix-style pathnames on Windows, especially if you use pathlib or os.path.join() or something.
In the days I used to use DOS, I found that command.com had problems with /, but you could binary-patch command.com to rename cd to something else, and use a cd.exe that worked with /'s fine - it was command.com that had the problem, not the underlying OS.
https://doc.qt.io/qtforpython/overviews/resources.html seems to say it has something to do with not losing files, which seems kinda odd.
So why does Qt need resource files, and can I safely skip them?

TL; DR; We are not required to use qresource, it is only an option.
It is not that Qt requires qresources on a mandatory basis, but rather that it is an option that in the world of Qt provides us. That is, using them or not in general does not bring a benefit or harm.
From a Qt/C++ point of view, resources allow us to embed resources within binaries. And this abstraction possibly avoids having problems with the directory system paths. And since PySide2 is a Qt wrapper it also tries to expose that tool.
And as you point out, there are several python libraries that already handle the paths in a generic way avoiding the known routing problems between OS so you could use it.

Deploying Python modules for 3D software

I have been developing a fairly extensive library of python modules that automate the more time consuming parts of "3D character development" for games/film/tv.
All of my code up until a few months ago has been run within Maya's dedicated python interpreter, however, my GUIs are built in PySide/PyQt, and so, run just fine in mac/windows/linux or a few other Graphics programs such as Nuke, XSI, Max.
What I would really like to figure out is a "simple" way to distribute my code to various different people ---> using various different operating Systems ---> potentially using various applications (Nuke, XSI, Max), which, in turn, have their own dedicated python interpreters.
The obvious option would be pip and easy_install.. These modules are clearly the "right" way to go, but its not really clear how a user would install/run them under the dedicated python installs that ship with Maya/Nuke/ etc...Though, it does seem possible (as explained here). Still Its going to be a pretty big barrier for a less-technical user.
Any help or points in the right direction would be immensely appreciated..

I would not say that pip/easy_install are the 'right' way for this problem. They are pretty good (not quite 'great') tools for motivated, technically inclined users -- but even in that context they have issues (such as unintended upgrades or deletions). Most importantly, they are opt-in methods: nobody can make you pip unless you want to. This means users can accidentally or deliberately get themselves into very different positions from each other, which makes support and maintenance a nightmare.
I've had very good luck in Maya distributing a zipped file containing a complete environment - all the modules etc. userSetup.py adds that zip to the path and the Python's native zipimport functionality handles the rest. This makes sure that there is only one file to maintain and distribute. It also fixes the common problem of leftover .pyc files creating havok after .py files get moved or renamed. Since this is all standard python, I'd assume this will work for any app-specific python that uses a 2.6+ version of python, though I've never tried it in Nuke or Max.
The main wrinkle will be modules with .pyd or other binary components, typically these don't work inside the zip files. I include a bootstrap routine which unpacks those to a (disposable) location on the user's disk and adds that to the path.
There's a detailed discussion of the method here and some background here

What is the best practice for bundling third party libraries with your python project? (for users with no Internet)

I'm trying to build a project which includes a few open source third party libraries, but I want to bundle my distribution with said libraries - because I expect my users to want to use my project without an Internet connection. Additionally, I'd like to leave their code 100% untouched and even leave their directory structure untouched if I can. My approach so far has been to extract the tarballs and place the entire folder in MyProject/lib, but I've had to put __init__.py in every sub-directory to be able to reference the third-party code in my own modules.
What's the common best practice for accomplishing this task? How can I best respect the apps' developers by retaining their projects' structure? How can I make importing these libraries in my code less painful?
I'm new to making a distributable project in Python, so I've no clue if there's something I can do in __init__.py or in setup.py to keep myself from having to type from lib.app_name.app_name.app_module include * and whatnot.
For what it's worth, I will be distributing this on OS X and possibly *nix. I'd like to avoid using another library (e.g. setuptools) to accomplish this. (Weirdly, it seems to already be installed on my system, but I didn't install it, so I've no idea what's going on.)
I realize that this question seems to be a duplicate of this one, but I don't think it is because I'm asking for the best practice for the "distribute-the-third-party-code-with-your-own" approach. Please forgive me if I'm asking a garbage question.

buildout is good solution for building and distributing Python software.

You could have a look at the inner workings of virtualenv to get some inspiration how go about this. Maybe you can resuse code from there.

deploying python applications

Is it possible to deploy python applications such that you don't release the source code and you don't have to be sure the customer has python installed?
I'm thinking maybe there is some installation process that can run a python app from just the .pyc files and a shared library containing the interpreter or something like that?
Basically I'm keen to get the development benefits of a language like Python - high productivity etc. but can't quite see how you could deploy it professionally to a customer where you don't know how there machine is set up and you definitely can't deliver the source.
How do professional software houses developing in python do it (or maybe the answer is that they don't) ?

You protect your source code legally, not technologically. Distributing py files really isn't a big deal. The only technological solution here is not to ship your program (which is really becoming more popular these days, as software is provided over the internet rather than fully installed locally more often.)
If you don't want the user to have to have Python installed but want to run Python programs, you'll have to bundle Python. Your resistance to doing so seems quite odd to me. Java programs have to either bundle or anticipate the JVM's presence. C programs have to either bundle or anticipate libc's presence (usually the latter), etc. There's nothing hacky about using what you need.
Professional Python desktop software bundles Python, either through something like py2exe/cx_Freeze/some in-house thing that does the same thing or through embedding Python (in which case Python comes along as a library rather than an executable). The former approach is usually a lot more powerful and robust.

Yes, it is possible to make installation packages. Look for py2exe, cx_freeze and others.
No, it is not possible to keep the source code completely safe. There are always ways to decompile.
Original source code can trivially be obtained from .pyc files if someone wants to do it. Code obfuscation would make it more difficult to do something with the code.

I am surprised no one mentioned this before now, but Cython seems like a viable solution to this problem. It will take your Python code and transpile it into CPython compatible C code. You also get a small speed boost (~25% last I checked) since it will be compiled to native machine code instead of just Python byte code. You still need to be sure the user has Python installed (either by making it a pre-requisite pushed off onto the user to deal with, or bundling it as part of the installer process). Also, you do need to have at least one small part of your application in pure Python: the hook into the main function.
So you would need something basic like this:
import cython_compiled_module
if __name__ == '__main__':
cython_compiled_module.main()
But this effectively leaks no implementation details. I think using Cython should meet the criteria in the question, but it also introduces the added complexity of compiling in C, which loses some of Python's easy cross-platform nature. Whether that is worth it or not is up to you.
As others stated, even the resulting compiled C code could be decompiled with a little effort, but it is likely much more close to the type of obfuscation you were initially hoping for.

Well, it depends what you want to do. If by "not releasing the source code" you mean "the customer should not be able to access the source code in any way", well, you're fighting a losing battle. Even programs written in C can be reverse engineered, after all. If you're afraid someone will steal from you, make them sign a contract and sue them if there's trouble.
But if you mean "the customer should not care about python files, and not be able to casually access them", you can use a solution like cx_Freeze to turn your Python application into an executable.

Build a web application in python. Then the world can use it via a browser with zero install.

organizing many python scripts, in a large corporate environment

We've been doing a fair amount of Python scripting, and now we have a
directory with almost a hundred loosely related scripts. It's
obviously time to organize this, but there's a problem. These scripts
import freely from each other and although code reuse is generally a
good thing it makes it quite complicated to organize them into
directories.
There's a few things that you should know about our corporate environment:
I don't have access to the users'
environment. Editing the PYTHONPATH
is out, unless it happens in the
script itself.
Users don't install things. Systems
are expected to be already
installed and working on, so setup.py
is not a solution unless I can run it once for all users.
I'm quite willing to edit my import statements and do some minor refactoring, but the solutions I see currently require me to divide all the code strictly between "user runnable scripts" and "libraries", which isn't feasible, considering the amount of code.
Has anyone out there solved a similar problem? Are you happy with it?
--Buck
Another way to state the same question:
When looking at google code search, this kind of code is rampant (below). Is everyone happy with this? Is there a good alternative?
sys.path.insert(0, os.path.dirname(os.path.dirname(
os.path.dirname(os.path.abspath(__file__))
)))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.