Compiling Python statically

Compiling Python statically - python

I can use
python -m py_compile mytest.py
And it will byte-compile the file. From reading some other documentation, it was my impression that it byte-compiled any modules imported. But if I change any of the files it imports, I see the changed functionality. Is there some way to completely compile a python script and modules it imports, so that any changes to the originals don't reflect? I want to do this for security purposes, essentially creating a "trusted" version which can't be subverted by changing the functionality of any modules that it calls.

If you compile to bytecode, then delete the source files, then the bytecode can't change. But if someone has the ability to change source files on your machine, they can also change bytecode files on your machine. I don't think this will give you any actual security.
If you want a single-file Python program, you can run from a zip file.
Another option is to use cx_freeze or a similar program to compile the program into a native executable.

We solved this problem by developing our own customer loader using signet http://jamercee.github.io/signet/. Signet will scan your python source and it's dependencies and calculate sha1 hashes, which it embeds in the loader. You deliver the loader AND your script to your users with instructions they run the loader. On invocation, it re-calculates the hashes and if they match, control is then transferred to your script. Signet also supports code signing and PE verification.

Freeze/Bundle:
bbfreeze cross-platform
cx_freeze cross-platform
py2exe Windows
py2app OS X
pyinstaller cross-platform
Cryptography:
Assuming your "process" for running "scripts" is secure and cannot be tampered with:
Create secure hashes of the scripts and record them (e.g: SHA1)
When executing "scripts" ensure their cryptographic hashes match (ensuring they haven't been tampered with).
A common approach to this for secure protocols and apis is to use HMAC

Related

Where does PyCharm store compiled files?

I'm new to PyCharm/Python, and can't figure out where the IDE stores compiled python *.pyc files.
Coming from the IntelliJ world, it is strange that I don't see any menu options to re-build the project, or build individual files.
I'm also unable to find any pyc files while searching the project directory, so basically, I've no idea whether successful compilation has happened at all, although the GitHub imported project is error free.
What can I do here?

Because most Python implementations are interpreted rather than a compiled, the compilation step happens when you run the code. This is why the PyCharm UI features a prominent "Run" button (▶️) but no compile button.
It is true that for CPython there is a compilation step which compiles from the Python code to bytecode, but this is an implementation detail. CPython 3 stores its cached compilation results in .pyc files in a directory called __pycache__. These files are automatically generated when a module is imported (using import module will result in a module.pyc file) but not when a normal program is run.
Lastly, as per #shmee's comment, it is possible to compile a source file with the py_compile module, but I should emphasise that this is not usually done or necessary.
Now, if you are worried about checking that your code is correct, in the interpreted language world we rely more strongly on testing. I would recommend that you investigate tests for your code (using pytest and the excellent test integration in PyCharm).

Let me begin with a bit on terminology:
Python is a programming language. It's "just" the programming language specification.
CPython is the reference implementation of the Python language. It's actually just one of several different Python interpreters. CPython itself works (let's call it an implementation detail) by translating (but you could also say compiling) the code in imported Python files/modules to bytecode and then executing that bytecode. It actually stores the translation as .pyc files in the folder of that file) to make subsequent imports faster, but that's specific to CPython and can also be disabled.
PyCharm is an integrated development environment. However it requires to "Configure a Python Interpreter" to run Python code.
That means that PyCharm isn't responsible for creating .pyc files. If you configured a non-CPython interpreter or used the environmental variable to disable the pyc file creation there won't be any pyc files.
But if you used an appropriate CPython interpreter in PyCharm it will create .pyc files for the files/modules you successfully imported. That means you actually have to import or otherwise run the Python files in your project to get the .pyc files.
Actually the Python documentation contains a note about the "compiled" Python files:
To speed up loading modules, Python caches the compiled version of each module in the __pycache__ directory under the name module.version.pyc, where the version encodes the format of the compiled file; it generally contains the Python version number. For example, in CPython release 3.3 the compiled version of spam.py would be cached as __pycache__/spam.cpython-33.pyc. This naming convention allows compiled modules from different releases and different versions of Python to coexist.
Python checks the modification date of the source against the compiled version to see if it’s out of date and needs to be recompiled. This is a completely automatic process. Also, the compiled modules are platform-independent, so the same library can be shared among systems with different architectures.
Python does not check the cache in two circumstances. First, it always recompiles and does not store the result for the module that’s loaded directly from the command line. Second, it does not check the cache if there is no source module. To support a non-source (compiled only) distribution, the compiled module must be in the source directory, and there must not be a source module.
Some tips for experts:
You can use the -O or -OO switches on the Python command to reduce the size of a compiled module. The -O switch removes assert statements, the -OO switch removes both assert statements and doc strings. Since some programs may rely on having these available, you should only use this option if you know what you’re doing. “Optimized” modules have an opt- tag and are usually smaller. Future releases may change the effects of optimization.
A program doesn’t run any faster when it is read from a .pyc file than when it is read from a .py file; the only thing that’s faster about .pyc files is the speed with which they are loaded.
The module compileall can create .pyc files for all modules in a directory.
There is more detail on this process, including a flow chart of the decisions, in PEP 3147.

how to distribute a package with out exposing the source code in python?

Consider that I have a package called "A" consisting of several modules and also nested packages. Now, I want to distribute this package to user and I do not want user to see my code at all. I heard that ".pyc" can be de-compiled. So, I am just wondering what could be the other alternatives for this problem.
It would be great if someone gives some ideas in this regard.

You actually have few options. First, you can compile your code into pyc files. However, this can be circumvented with the disassembler library dis, but this requires a lot of technical know-how. You can also use py2exe to package it as an exe file; this converts the pyc file into an exe file. This can still be disassembled but adds an extra layer. You also have a few encryption solutions; for example you can use pyconcrete to encrypt your imports until they are loaded into memory. You can also just encryption the entire application, then ship the decrypter and launcher with it as a C/C++ application (or any other compiled language). Lastly, if you are comfortable with getting python to run custom C/C++ code, you can also put your private code into a DLL or SO and call it directly for the script.

Python is an interpreted language. That means that if you want to distribute pyc files you'll have to have them run on the same OS/architecture as yours or you'll run into subtle problems. That, and the fact that most code can be decompiled to some degree, would urge me to rethink your use case.
Can you rethink your package as a service instead?

How to create Python module installer on Windows without including source files

I would like to create a Windows Installer of my Python module (as explained here https://docs.python.org/2/distutils/builtdist.html) without including the sources, meaning that the installer should only embed the Python bytecode.
I followed the solution explained here https://stackoverflow.com/a/3444346/343201 and it only works for ZIP distributions. In other words, if I create a ZIP built distribution, I manage to exclude all sources and export only bytecode (.pyc). On the other hand, if I create a MSI/WININST built distribution, it will only contain sources (.py).
I looked at the output of bdist_wininst and looks like when creating a Windows installer, only source files are exported, bytecode is ignored. Why? Is there a way to enforce the installer to embed only bytecode and no source files?
Thanks

You can use py2exe, though it uses your script files. It will make a .exe file and the source code stays locked.
You can try every way around. Head over here and try http://www.py2exe.org/old/

Python .pyc files and Windows UAC

I'm working on an Inno Setup installer for a Python application for Windows 7, and I have these requirements:
The app shouldn't write anything to the installation directory
It should be able to use .pyc files
The app shouldn't require a specific Python version, so I can't just add a set of .pyc files to the installer
Is there a recommended way of handling this? Like give the user a way to (re)generate the .pyc files? Or is the shorter startup time benefit from the .pyc files usually not worth worrying about?

PYC files aren't guaranteed to be compatible for different python versions. If you don't know that all your customers are running the same python versions, you really don't want to distribute pyc's directly. So, you have to choose between distributing PYCs and supporting multiple python versions.
You could create build process that compiles all your files using py_compile and zips them up into a version-specific package. You can do this with setuptools.; however it will be awkward to do because you'll have to run py_compile in every version you need to support.
If you are basically distributing a closed application and don't want people to have trivial access to your source code, then py2exe is probably a simpler alternative. If your python is supposed to be integrated into the user's python install, then it's probably simpler to just create a zip of your .py files and add a one-line .py stub that imports the zipped package(s) using zipfile
if it makes you feel better, PYC doesn't provide much extra security and it doesn't really boost perf much either :)

If you haven't read PEP 3147, that will probably answer your questions.
I don't mean the solution described in that PEP and implemented as of Python 3.2. That's great if your "multiple Python versions" just means "3.2, 3.3, and probably future 3.x". Or even if it means "2.6+ and 3.1+, but I only really care about 3.2 and 3.3, so if I don't get the pyc speedups for other ones that's OK".
But when I asked your supported versions, you said, "2.7", which means you can't rely on PEP 3147 to solve your problems.
Fortunately, the PEP is full of discussion of earlier attempts to solve the problem, and the pitfalls of each, and there should be more than enough there to figure out what the options are and how to implement them.
The one problem is that the PEP is very linux-centric—mainly because it's primarily linux distros that tried to solve the problem in the past. (Apple also did so, but their solution was (a) pretty much working, and (b) tightly coupled with the whole Mac-specific "framework" thing, so they were mostly ignored…)
So, it largely leaves open the question of "Where should I put the .pyc files on Windows?"
The best choice is probably an app-specific directory under the user's local application data directory. See Known Folders if you can require Vista or later, CSIDL if you can't. Either way, you're looking for the FOLDERID_LocalAppData or CSIDL_LOCAL_APPDATA, which is:
The file system directory that serves as a data repository for local (nonroaming) applications. A typical path is C:\Documents and Settings\username\Local Settings\Application Data.
The point is that it's a place for applications to store data that's separate for each user (and inside that user's profile directory), and also for each machine the user's roaming profile might end up on, which means you can safely put stuff there and know that the user has the permissions to write there without UAC getting involved, and also know (as well as you ever can) that no other user or machine will interfere with what's there.
Within that directory, you create a directory for your program, and put whatever you want there, and as long as you picked a unique name (e.g., My Unique App Name or My Company Name\My App Name or a UUID), you're safe from accidental collision with other programs. (There used to be specific guidelines on this in MSDN, but I can no longer find them.)
So, how do you get to that directory?
The easiest way is to just use the env variable %LOCALAPPDATA%. If you need to deal with older Windows, you can use %USERPROFILE% and tack \Local Settings\Application Data onto the end, which is guaranteed to either be the same, or end up in the same place via junctions.
You can also use pywin32 or ctypes to access the native Windows APIs (since there are at least 3 different APIs for this and at least two ways to access those APIs, I don't want to give all possible ways to write this… but a quick google or SO search for "pywin32 SHGetFolderPath" or "ctypes SHGetKnownFolderPath" or whatever should give you what you need).
Or, there are multiple third-party modules to handle this. The first one both Google and PyPI turned up was winshell.

Re-reading the original question, there's a much simpler answer that probably fits your requirements.
I don't know much about Inno, but most installers give you a way to run an arbitrary command as a post-copy step.
So, you can just use python -m compileall to create the .pyc files for you at install time—while you've still got elevated privileges, so there's no problem with UAC.
In fact, if you look at pywin32, and various other Python packages that come as installer packages, they do exactly this. This is an idiomatic thing to do for installing libraries into the user's Python installation, so I don't see why it wouldn't be considered reasonable for installing an executable that uses the user's Python installation.
Of course if the user later decides to uninstall Python 2.6 and install 2.7, your .pyc files will be hosed… but from your description, it sounds like your entire program will be hosed anyway, and the recommended solution for the user would probably be to uninstall and reinstall anyway, right?

Trimming Python Runtime

We've got a (Windows) application, with which we distribute an entire Python installation (including several 3rd-party modules that we use), so we have consistency and so we don't need to install everything separately. This works pretty well, but the application is pretty huge.
Obviously, we don't use everything available in the runtime. I'd like to trim down the runtime to only include what we really need.
I plan on trying out py2exe, but I'd like to try and find another solution that will just help me remove the unneeded parts of the Python runtime.

One trick I've learned while trimming down .py files to ship: Delete all the .pyc files in the standard library, then run your application throughly (that is, enough to be sure all the Python modules it needs will be loaded). If you examine the standard library directories, there will be .pyc files for all the modules that were actually used. .py files without .pyc are ones that you don't need.

Both py2exe and pyinstaller (NOTE: for the latter use the SVN version, the released one is VERY long in the tooth;-) do their "trimming" via modulefinder, the standard library module for finding all modules used by a given Python script; you can of course use the latter yourself to identify all needed modules, if you don't trust pyinstaller or py2exe to do it properly and automatically on your behalf.

This py2exe page on compression suggests using UPX to compress any DLLs or .pyd files (which are actually just DLLs, still). Obviously this doesn't help in trimming out unneeded modules, but it can/will trim down the size of your distribution, if that's a large concern.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.