deploying python applications

deploying python applications - python

Is it possible to deploy python applications such that you don't release the source code and you don't have to be sure the customer has python installed?
I'm thinking maybe there is some installation process that can run a python app from just the .pyc files and a shared library containing the interpreter or something like that?
Basically I'm keen to get the development benefits of a language like Python - high productivity etc. but can't quite see how you could deploy it professionally to a customer where you don't know how there machine is set up and you definitely can't deliver the source.
How do professional software houses developing in python do it (or maybe the answer is that they don't) ?

You protect your source code legally, not technologically. Distributing py files really isn't a big deal. The only technological solution here is not to ship your program (which is really becoming more popular these days, as software is provided over the internet rather than fully installed locally more often.)
If you don't want the user to have to have Python installed but want to run Python programs, you'll have to bundle Python. Your resistance to doing so seems quite odd to me. Java programs have to either bundle or anticipate the JVM's presence. C programs have to either bundle or anticipate libc's presence (usually the latter), etc. There's nothing hacky about using what you need.
Professional Python desktop software bundles Python, either through something like py2exe/cx_Freeze/some in-house thing that does the same thing or through embedding Python (in which case Python comes along as a library rather than an executable). The former approach is usually a lot more powerful and robust.

Yes, it is possible to make installation packages. Look for py2exe, cx_freeze and others.
No, it is not possible to keep the source code completely safe. There are always ways to decompile.
Original source code can trivially be obtained from .pyc files if someone wants to do it. Code obfuscation would make it more difficult to do something with the code.

I am surprised no one mentioned this before now, but Cython seems like a viable solution to this problem. It will take your Python code and transpile it into CPython compatible C code. You also get a small speed boost (~25% last I checked) since it will be compiled to native machine code instead of just Python byte code. You still need to be sure the user has Python installed (either by making it a pre-requisite pushed off onto the user to deal with, or bundling it as part of the installer process). Also, you do need to have at least one small part of your application in pure Python: the hook into the main function.
So you would need something basic like this:
import cython_compiled_module
if __name__ == '__main__':
cython_compiled_module.main()
But this effectively leaks no implementation details. I think using Cython should meet the criteria in the question, but it also introduces the added complexity of compiling in C, which loses some of Python's easy cross-platform nature. Whether that is worth it or not is up to you.
As others stated, even the resulting compiled C code could be decompiled with a little effort, but it is likely much more close to the type of obfuscation you were initially hoping for.

Well, it depends what you want to do. If by "not releasing the source code" you mean "the customer should not be able to access the source code in any way", well, you're fighting a losing battle. Even programs written in C can be reverse engineered, after all. If you're afraid someone will steal from you, make them sign a contract and sue them if there's trouble.
But if you mean "the customer should not care about python files, and not be able to casually access them", you can use a solution like cx_Freeze to turn your Python application into an executable.

Build a web application in python. Then the world can use it via a browser with zero install.

Related

How to protect my Python code before distribution?

I have written a python code which takes an input data file, performs some processing on the data and writes another data file as output.
I should distribute my code now but the users should not see the source code but be able to just giving the input and getting the output!
I have never done this before.
I would appreciate any advice on how to achieve this in the easiest way.
Thanks a lot in advance

As Python is an interpreted language by design; and as it compiles code to a bytecode (- which doesn't help the fact you're trying to conceal it, as bytecodes are easier to reverse -) there's no real secure way to hide your source code whereby it is not recoverable, as is true for any programming language, really.
Initially, if you'd wanted to work with a language that can't be so easily reversed- you should've gone for a more native language which compiles directly to the underlying architecture's machine code which is significantly harder to reproduce in the original language let alone read due to neat compiler optimizations, the overhead given by CISC et cetera.
However, some libraries that do convert your source code into an executable format (by packing the Python interpreter and the bytecode alongside it) can be used such as:
cx_Freeze - for freezing any code >=Python 2.7 for any platform, allegedly.
PyInstaller - for freezing general purpose code, it does state additionally that it works with third-party libraries.
py2exe -for freezing code into Windows-only executable format.
Or you might consider a substitute for this, which is code obfuscation which still allows the user to read the source code however make it near-to-impossible to read.
However, an issue brought up with this is that, it'd be harder for code addition as bad code obfuscation techniques could make the code static. Also, on the latter case, the code could have overhead brought by redundant code meant to fool or trick the user into thinking the code is doing something which it is not.
Also in general it negates the standard practice of open-source which is what Python loves to do and support.
So to really conclude, if you don't want to read everything above; the first thing you did wrong was choose Python for this, a language that supports open source and is open source as well. Thus to mitigate the issue you should either reconsider the language, or follow the references above to links to modules which might help aide basic source code concealment.

Firstly, as Python is an interpreted language, I think you cannot completely protect your Python code, .pyc files can be uncompiled to get back .py files (using uncompyle6 for example).
So the only thing you can do is make it very hard to read.
I recommend to have a look at code obfuscation, which consists in making your code unreadable by changing variables/function names, removing comments and docstrings, removing useless spaces, etc. Pyminifier does that kind of things.
You can also write your own obfuscation script.
Then you can also turn your program into a single executable (using pyinstaller for example). I am pretty sure there is a way to get .py files back from the executable, but it just makes it harder. Also beware of cross-platform compatibility when making an executable.

Going through above responses, my understanding is that some of the strategies mentioned may not work if your client wants to execute your protected script along with other unprotected scripts.
One other option is to encrypt your script and then use an interpreter that can decrypt and execute it. It too has some limitations.
ipepycrypter is a suite that helps protect python scripts. This is accomplished by hiding script implementation through encryption. The encrypted script is executed by modifed python interpreter. ipepycrypter consists of encryption tool ipepycrypt and python interpreter ipepython.
More information is available at https://ipencrypter.com/user-guides/ipepycrypter/

One other option, of course, is to expose the functionality over the web, so that the user can interact through the browser without ever having access to the actual code.

There are several tools which compile Python code into either (a) compiled modules usable with CPython, or (b) a self-contained executable.
https://cython.org/ is the best known, and probably? oldest, and it only takes a very small amount of effort to prepare a traditional Python package so that it can be compiled with Cython.
http://numba.pydata.org/ and https://pythran.readthedocs.io/ can also be used in this way, to produce Python compiled modules such that the source doesnt need to be distributed, and it will be very difficult to decompile the distributable back into usable source code.
https://mypyc.readthedocs.io is newer player, an offshoot of the mypy toolkit.
Nuitka is the most advanced at creating a self-contained executable. https://github.com/Nuitka/Nuitka/issues/392#issuecomment-833396517 shows that it is very hard to de-compile code once it has passed through Nuitka.
https://github.com/indygreg/PyOxidizer is another tool worth considering, as it creates a self-contained executable of all the needed packages. By default, only basic IP protection is provided, in that the packages inside it are not trivial to inspect. However for someone with a bit of knowledge of the tool, it is trivial to see the packages enclosed within the binary. However it is possible to add custom module loaders, so that the "modules" in the binary can be stored in unintelligible formats.
Finally, there are many Python to C/go/rust/etc transpilers, however these will very likely not be usable except for small subsets of the language (e.g. will 3/0 throw the appropriate exception in the target language?), and likely will only support a very limited subset of the standard library, and are unlikely to support any imports of packages beyond the standard library. One example is https://github.com/py2many/py2many , but a search for "Python transpiler" will give you many to consider.

Compile Python Code In Objective-C [duplicate]

So it's a new millennium; Apple has waved their hand; it's now legal to include a Python interpreter in an iPhone (App Store) app.
How does one go about doing this? All the existing discussion (unsurprisingly) refers to jailbreaking. (Older question: Can I write native iPhone apps using Python)
My goal here isn't to write a PyObjC app, but to write a regular ObjC app that runs Python as an embedded library. The Python code will then call back to native Cocoa code. It's the "control logic is Python code" pattern.
Is there a guide to getting Python built in XCode, so that my iPhone app can link it? Preferably a stripped-down Python, since I won't need 90% of the standard library.
I can probably figure out the threading and Python-extension API; I've done that on MacOS. But only using command-line compilers, not XCode.

It doesn't really matter how you build Python -- you don't need to build it in Xcode, for example -- but what does matter is the product of that build.
Namely, you are going to need to build something like libPython.a that can be statically linked into your application. Once you have a .a, that can be added to the Xcode project for your application(s) and, from there, it'll be linked and signed just like the rest of your app.
IIRC (it has been a while since I've built python by hand) the out-of-the-box python will build a libPython.a (and a bunch of other libraries), if you configure it correctly.
Of course, your second issue is going to be cross-compiling python for ARM from your 86 box. Python is an autoconf based project and autoconf is a pain in the butt for cross-compilation.
As you correctly state, making it small will be critical.
Not surprising, either, is that you aren't the first person to want to do this, but not for iOS. Python has been squeezed into devices much less capable than those that run iOS. I found a thread with a bunch of links when googling about; it might be useful.
Also, you might want to join the pyobjc-dev list. While you aren't targeting a PyObjC based application (which, btw, is a good idea -- PyObjC has a long way to go before it'll be iOS friendly), the PyObjC community has been discussing this and Ronald, of anyone, is probably the most knowledgeable person in this particular area. Note that PyObjC will have to solve the embedded Python on iOS problem prior to porting PyObjC. Their prerequisite is your requirement, as it were.

I've put a very rough script up on github that fetches and builds python2.6.5 for iPhone and simulator.
http://github.com/cobbal/python-for-iphone
Work in progress
Somewhat depressing update nearly 2 years later: (copied from README on github)
This project never really got python running on the iPhone to my
satisfaction, and I can't recommend using it for any serious project
at this stage.
Most notably missing is pyobjc support (which turns out to be much
harder to port to iPhone since it relies on more platform-specific
code)
Also missing is the ability to statically compile modules, (all are
currently built as dylibs which works for development, but to my
knowledge wouldn't be allowed in the App Store)
At this point this project is mostly meant to be a starting point for
anyone smarter than me who wants to and can tackle the above issues.
I really wish it were practical to write apps entirely in Python, but
at this point it seems impossible.

I also started such a project. It comes with its own simplified compile script so there is no need to mess around with autoconf to get your cross compiled static library. It is able to build a completely dependency-free static library of Python with some common modules. It should be easily extensible.
https://github.com/albertz/python-embedded/

Why do C programs require decompilers but python programs dont?

If I write a python script, anyone can simply point an editor to it and read it. But for programming written in C, one would have to use decompilers and hex tables and such. Why is that? I mean I simply can't open up the Safari web browser and look at its code.

Note: The author disavows a deep expertise in this subject. Some assertions may be incorrect.
Python actually is compiled into bytecode, which is what gets run by the python interpreter. Whenever you use a Python module, Python will generate a .pyc file with a name corresponding to the module. This is the equivalent of the .o file that's generated when you compile a C file.
So if you want something to disassemble, the .pyc file would be it :)
The process that Python goes through when compiling a module is pretty similar to what gcc or another C compiler does with C source code. The major difference is that it happens transparently as part of execution of the file. It's also optional: when running a non-module, i.e. an end-user script, Python will just interpret the code rather than compiling it first.
So really your question is "Why are python programs distributed as source rather than as compiled modules?" Or, put another way, "Why are C applications distributed as compiled binaries rather than as source code?"
It used to be very common for C applications to be distributed as source code. This was back before operating systems and their various subentities (i.e. linux distributions) became more established. Some distros, for example gentoo, still distribute apps as source code. Apps which are a bit more cutting edge or obscure are still distributed as source code for all platforms they target.
The reason for this is compatibility, and dependencies. The reason you can run the precompiled binary Safari on a Mac, or Firefox on Ubuntu Linux, is because it's been specifically built for that operating system, architecture (e.g. x86_64), and set of libraries.
Unfortunately, compilation of a large app is pretty slow, and needs to be redone at least partially every time the app is updated. Thus the motivation for binary distributions.
So why not create a binary distribution of Python? For one thing, as Aaron mentions, modules would need to be recompiled for each new version of the Python bytecode. But this would be similar to rebuilding a C app to link with a newer version of a dynamic library — Python modules are analogous in this sense to C libraries.
The real reason is that Python compilation is very much quicker than C compilation. This is in part, I think, because of the dynamic nature of the language, and also because it's not as thorough of a compilation. This has its tradeoffs: in particular, Python apps run much more slowly than do their C counterparts, because Python has to interpret the compiled bytecode into instructions for the processor, whereas the C app already contains such instructions.
That all being said, there is a program called py2exe that will take a Python module and distribution and build a precompiled windows executable, including in it the logic of the module and its dependencies, including Python itself. I guess the point of this is to avoid having to coerce people into installing Python on their Windows system just to run your app. Under linux, or I think even OS/X, Python is usually already installed, so precompilation is not really necessary. Linux systems also have super-dandy package managers that will transparently install dependencies such as Python if they are not already installed.

Python is a script language, runs in a virtual machine through an interpeter.
C is a compiled language, the code compiled to binary code which the computer can run without all that extra stuff Python needs.

This is sorta a big topic. You should look into your local friendly Computer Science curriculum, you'll find a lot of great stuff on this subject there.
The short answer is the Python is an "interpreted" language, which means that it requires a machine language program (the python interpreter) to run the python program, adding a layer of indirection. C or C++ are different. They are compiled directly to machine code, which runs directly on your processor.
There is a lot of additional voodoo to be learned here, however. Technically Python is compiled to a bytecode, and modern interpreters do more and more "Just in Time" compilation, so the boundaries between compiled and interpreted code are getting fuzzier all the time.

In several comments you asked: "Is it then possible to compile python to an executable binary file and then simply distribute that?"
From a theoretical viewpoint, there's no question the answer is yes -- a Python program could be compiled to, and distributed as, fully compiled machine code.
From a practical viewpoint, it's open to a lot more question. There are a few things like Unladen Swallow, Psyco, Shed Skin, and PyPy that you might want to know about though.
Unladen Swallow is primarily an attempt at making Python run faster, but part of the plan to do so involves using LLVM for its back-end. LLVM can (among other things) produce native machine code output. The last couple of releases of Unladen Swallow have used LLVM for native code generation, but 1) the most recent update on the web site is from late 2009, and 2) the release notes for that version say: "The Unladen Swallow team does not recommend wide adoption of the 2009Q3 release."
Psyco works as a plug-in for Python that basically does JIT compilation, so even though it can speed up execution (quite a lot in some cases), it doesn't produce a machine-code executable you can distribute. In short, while it's sort of similar to what you want, it's not intended to do exactly what you've asked for.
Shed Skin Python-to-C++ produces C++ as its output, and you then compile the C++ and (potentially) distribute the result of that. Shedskin is currently at version 0.5 -- i.e., nobody's claiming that it's a finished, released product. On the other hand, development is ongoing, and each release does seem to include pretty substantial improvements.
PyPy is a Python implementation written in Python. Their intent is to allow code production to be "plugged in" without affecting the rest of the implementation -- but while they currently support 4 different code generation models, I don't believe any of them results in producing native machine code that runs directly on the hardware.
Bottom line: work has been done and is being done with the intent of doing what you asked about, but at least to my knowledge there's not really anything I could reasonably recommend as a finished product that you can really depend on to do the job right now. The primary emphasis is really on execution speed, not producing standalone executables.

Yes, you can - it's called disassembling, and allows you to look at the code of Safari perfectly well. The thing is, C, among other languages, compiles to native code, i.e. code that your CPU can "understand" and execute.
More or less obviously, the level of abstraction present in the instruction set of your CPU is much smaller than that of a high level language like Python. The CPU instructions are not concerned with "downloading that URI", but more "check if that bit is set in a hardware register".
So, in conclusion, the level of complexity present in a native application is much higher when looking at the machine code, so many people simply can't make any sense of what is going on there, it's hard to get the big picture. With experience and time at your hands, it is possible though - people do it all the time, reversing applications and all.

you can't open up and read the code that actually runs for python either. Try
import dis
def foo():
for i in range(100):
print i
print dis.dis(foo)
That will show you the (human readable) bytcode of the foo program. equivalently, you can save the file and import it from the interactive python interpreter. This will create a .pyc file with the same basename as the script. open that with a hex editor and you are looking at the actually python bytecode.
The reason for the difference is that python changes up it's byte code between releases so that you would either need to distribute a different version of a binary only release for each version of python. This would be a pain.
With C, it's compiled to native code and so the byte code is much more stable making binary only releases possible.

because C code is complied to object (machine) code and python code is compiled into an intermediate byte code. I am not sure if you are even referring to the byte code of python - you must be referring to the source file itself which is directly executable (hiding the byte code from you!). C needs to be compiled and linked.

Python scripts are parsed and converted to binary only when they're run - i.e., they're text files and you can read them with an editor.
C code is compiled and linked to an executable binary file before they can be run. Normally, only this executable binary file is distributed - hence you need a decompiler. You can always view the source code, if you've access to it.

Not all C programs require decompilers. There's lots of C code distributed in source form. And some Python programs do require decompilers, if distributed as bytecode (.pyc files).
But, to the extent that your assumptions are valid, it's because C is a compiled language while Python is an interpreted language.

Python scripts are analogous to a man looking at a to-do list written in English (or language he understands). The man has to do all the work, every time that list of things has to be done.
If the man, instead of doing the steps on his own each time, creates and programs a robot which can carry out those steps again and again (and probably faster than him), that robot is analogous to the C program.
The man in the python case is called the "interpreter" and in the C case is called the "compiler", and the C robot is called the compiled program/executable.
When you look at the python program source, you see the to-do list. In case of the robot, you see the gears, motors and batteries, etc, which look very different from the to-do list. If you could get hold of the C "to-do" list, it looks somewhat like the python code, just in a different language.

G-WAN executes ANSI C scripts on the fly -making it just like Python scripts.
This can be server-side scripts (using G-WAN as a Web server) or any general-purpose C program and you can link any existing library.
Oh, and G-WAN C scripts are much faster than Python, PHP or Java...

Real-world Jython applications

I recently started learning Python. Not yet ventured into coding.
During one of my learning sessions, i came accross the term Jython.
I googled it & got some information.
I would like to know if anyone has implemented any real-world program using Jython.

Most of the time, Jython isn't used directly to write full read-world programs, but a lot of programs actually embed Jython to use it as a scripting language.
The official Jython website gives a list of projects, some written in Jython, others using Jython for scripting:
http://wiki.python.org/jython/JythonUsers

I am writing a full application in Jython at the moment, and would highly recommend it. Having all of the Java libraries at your disposal is very handy, and the Python syntax and language features actually make using some of them easier than it is in Java (I'm mostly talking about Swing here).
Check out the chapter on GUI Applications from the Jython book. It does a lot of comparisons like 'Look at all this Java code, and now look at it reduced to Python code of half the length!'.
The only caveats I've found are:
Jython development tends to run slightly behind Python, which can be annoying if you find a cool way of doing something in Python, only to discover it's not supported in the current Jython version.
Occasionally you might have hiccups with the interface between Python and Java (I have a couple of unsolved problems here and here, although there are always workarounds for this kind of thing).
Distribution is not as simple as it could be, although once you figure out how to do it, it's fairly painless. I recommend following the method here. It essentially consists of:
Exploding jython.jar and adding your own modules into it.
Writing and compiling a small Java class that creates a Python interpreter and loads up your Python modules.
Creating an executable .jar file consisting of the jython.jar modules, your own Python modules, and the Java class.

Jython really shines for dependency injection.
You know those pesky variables you have to give your program, like
file system paths
server names
ports
Jython provides a really nice way of injecting those variables by putting them in a script. It works equally well for injecting java dependencies, as well.

WebSphere and WebLogic use it as their default scripting engine for administrative purposes.
A lot of other Oracle products ship it as part of their "oracle_commons" module (Oracle Universal Installer, Oracle HTTP Server etc). It's mostly version 2.2 being deployed though, which is a bit old and clunky.

There is a list of application that uses jython at http://wiki.python.org/jython/JythonUsers

Would Python make a good substitute for the Windows command-line/batch scripts?

I've got some experience with Bash, which I don't mind, but now that I'm doing a lot of Windows development I'm needing to do basic stuff/write basic scripts using
the Windows command-line language. For some reason said language really irritates me, so I was considering learning Python and using that instead.
Is Python suitable for such things? Moving files around, creating scripts to do things like unzipping a backup and restoring a SQL database, etc.

Python is well suited for these tasks, and I would guess much easier to develop in and debug than Windows batch files.
The question is, I think, how easy and painless it is to ensure that all the computers that you have to run these scripts on, have Python installed.

Summary
Windows: no need to think, use Python.
Unix: quick or run-it-once scripts are for Bash, serious and/or long life time scripts are for Python.
The big talk
In a Windows environment, Python is definitely the best choice since cmd is crappy and PowerShell has not really settled yet. What's more Python can run on several platform so it's a better investment. Finally, Python has a huge set of library so you will almost never hit the "god-I-can't-do-that" wall. This is not true for cmd and PowerShell.
In a Linux environment, this is a bit different. A lot of one liners are shorter, faster, more efficient and often more readable in pure Bash. But if you know your quick and dirty script is going to stay around for a while or will need to be improved, go for Python since it's far easier to maintain and extend and you will be able to do most of the task you can do with GNU tools with the standard library. And if you can't, you can still call the command-line from a Python script.
And of course you can call Python from the shell using -c option:
python -c "for line in open('/etc/fstab') : print line"
Some more literature about Python used for system administration tasks:
The IBM lab point of view.
A nice example to compare bash and python to script report.
The basics.
The must-have book.

Sure, python is a pretty good choice for those tasks (I'm sure many will recommend PowerShell instead).
Here is a fine introduction from that point of view:
http://www.redhatmagazine.com/2008/02/07/python-for-bash-scripters-a-well-kept-secret/
EDIT: About gnud's concern: http://www.portablepython.com/

Are you aware of PowerShell?

Anything is a good replacement for the Batch file system in windows. Perl, Python, Powershell are all good choices.

#BKB definitely has a valid concern. Here's a couple links you'll want to check if you run into any issues that can't be solved with the standard library:
Pywin32 is a package for working with low-level win32 APIs (advanced file system modifications, COM interfaces, etc.)
Tim Golden's Python page: he maintains a WMI wrapper package that builds off of Pywin32, but be sure to also check out his "Win32 How Do I" page for details on how to accomplish typical Windows tasks in Python.

Python is certainly well suited to that. If you're going down that road, you might also want to investigate SCons which is a build system itself built with Python. The cool thing is the build scripts are actually full-blown Python scripts themselves, so you can do anything in the build script that you could otherwise do in Python. It makes make look pretty anemic in comparison.
Upon rereading your question, I should note that SCons is more suited to building software projects than to writing system maintenance scripts. But I wouldn't hesitate to recommend Python to you in any case.

As a follow up, after some experimentation the thing I've found Python most useful for is any situation involving text manipulation (yourStringHere.replace(), regexes for more complex stuff) or testing some basic concept really quickly, which it is excellent for.
For stuff like SQL DB restore scripts I find I still usually just resort to batch files, as it's usually either something short enough that it actually takes more Python code to make the appropriate system calls or I can reuse snippets of code from other people reducing the writing time to just enough to tweak existing code to fit my needs.
As an addendum I would highly recommend IPython as a great interactive shell complete with tab completion and easy docstring access.

I've done a decent amount of scripting in both Linux/Unix and Windows environments, in Python, Perl, batch files, Bash, etc. My advice is that if it's possible, install Cygwin and use Bash (it sounds from your description like installing a scripting language or env isn't a problem?). You'll be more comfortable with that since the transition is minimal.
If that's not an option, then here's my take. Batch files are very kludgy and limited, but make a lot of sense for simple tasks like 'copy some files' or 'restart this service'. Python will be cleaner, easier to maintain, and much more powerful. However, the downside is that either you end up calling external applications from Python with subprocess, popen or similar. Otherwise, you end up writing a bunch more code to do things that are comparatively simple in batch files, like copying a folder full of files. A lot of this depends on what your scripts are doing. Text/string processing is going to be much cleaner in Python, for example.
Lastly, it's probably not an attractive alternative, but you might also consider VBScript as an alternative. I don't enjoy working with it as a language personally, but if portability is any kind of concern then it wins out by virtue of being available out of the box in any copy of Windows. Because of this I've found myself writing scripts that were unwieldy as batch files in VBScript instead, since I can't usually depend on Python or Perl or Bash being available on Windows.

Python, along with Pywin32, would be fine for Windows automation. However, VBScript or JScript used with the Windows Scripting Host works just as well, and requires nothing additional to install.

I've been using a lot of Windows Script Files lately. More powerful than batch scripts, and since it uses Windows scripting, there's nothing to install.

As much as I love python, I don't think it a good choice to replace basic windows batch scripts.
I can't see see someone having to import modules like sys, os or getopt to do basic things you can do with shell like call a program, check environment variable or an argument.
Also, in my experience, goto is much easier to understand to most sysadmins than a function call.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.