I write tools that are used in a shared workspace. Since there are multiple OS's working in this space, we generally use Python and standardize the version that is installed across machines. However, if I wanted to write some things in C, I was wondering if maybe I could have the application wrapped in a Python script, that detected the operating system and fired off the correct version of the C application. Each platform has GCC available and uses the same shell.
One idea was to have the C compiled to the users local ~/bin, with timestamp comparison with C code so it is not compiled each run, but only when code is updated. Another was to just compile it for each platform, and have the wrapper script select the proper executable.
Is there an accepted/stable process for this? Are there any catches? Are there alternatives (assuming the absolute need to use native C code)?
Clarification: Multiple OS's are involved that do not share ABI. Eg. OS X, various Linuxes, BSD etc. I need to be able to update the code in place in shared folders and have the new code working more or less instantaneously. Distributing binary or source packages is less than ideal.
Launching a Python interpreter instance just to select the right binary to run would be much heavier than you need. I'd distribute a shell .rc file which provides aliases.
In /shared/bin, you put the various binaries: /shared/bin/toolname-mac, /shared/bin/toolname-debian-x86, /shared/bin/toolname-netbsd-dreamcast, etc. Then, in the common shared shell .rc file, you put the logic to set the aliases according to platform, so that on OSX, it gets alias toolname=/shared/bin/toolname-mac, and so forth.
This won't work as well if you're adding new tools all the time, because the users will need to reload the aliases.
I wouldn't recommend distributing tools this way, though. Testing and qualifying new builds of the tools should be taking up enough time and effort that the extra time required to distribute the tools to the users is trivial. You seem to be optimizing to reduce the distribution time. Replacing tools that quickly in a live environment is all too likely to result in lengthy and confusing downtime if anything goes wrong in writing and building the tools--especially when subtle cross-platform issues creep in.
Also, you could use autoconf and distribute your application in source form only. :)
You know, you should look at static linking.
These days, we all have HUGE hard drives, and a few extra megabytes (for carrying around libc and what not) is really not that big a deal anymore.
You could also try running your applications in chroot() jails and distributing those.
Depending on your mix os OSes, you might be better off creating packages for each class of system.
Alternatively, if they all share the same ABI and hardware architecture, you could also compile static binaries.
Related
I would like to force Python to use pre-compiled bytecode files that I provide, and avoid any other behavior that 'runs the script' but wastes time trying to regenerate the bytecode. Python however is designed to make this behavior generally transparent to the user, so I can't find any obvious ways to control (or even inspect) what it's doing in this regard.
My question(s):
Is my only option to remove the .py source, leaving only .pyc files? (I don't want to do that because keeping the source around has proven very useful for debugging in this scenario.)
How can I check up on this behavior to see what's really happening?
Background & Motivation
I am creating a package of Python modules to be distributed over read-only network file system, and I include pre-compiled bytecode as .pyc along with the Python packages. I can guarantee that the end users will always be using it with the same version of Python, on the same Linux OS, with the same kernel, in the same environment, etc, and therefore should never have to regenerate the bytecode. (This is a special setup for a mid-sized scientific experiment where we enforce these things in the environment we provide for our software developers. The package manager in this environment refuses to set up packages that don't match each other in the ways we specify.)
In this context, I want to ensure that Python does not try to regenerate bytecode or write .pyc's because we want to avoid slow startup times importing some large modules (matplotlib, scipy, and some others). Python's default behavior, as I understand it, is to
always check if the .pyc files are stale
always regenerate this bytecode if it is stale
always try to overwrite the .pyc file with the new bytecode, and
never tell the user if overwriting the .pyc failed.
Step 1 should not be necessary, but I'm not too worried about it because it only requires checking a few bytes. I think it doesn't take much time (even on our network file system, which is optimized for serving up small or partial binary files). I suppose I could be wrong about this for packages that have many .pyc files?
Step 2 should never be necessary after the package is built. If this happens then there is a bug in the package management.
Step 3 will always fail here because this is a read-only filesystem. If this happens then we need to know about it.
Step 4 is frustrating because I can't tell what's going on. And I have seen modules (e.g. matplotlib) which load more quickly on subsequent imports in this scenario, which I don't understand.
I have been developing a fairly extensive library of python modules that automate the more time consuming parts of "3D character development" for games/film/tv.
All of my code up until a few months ago has been run within Maya's dedicated python interpreter, however, my GUIs are built in PySide/PyQt, and so, run just fine in mac/windows/linux or a few other Graphics programs such as Nuke, XSI, Max.
What I would really like to figure out is a "simple" way to distribute my code to various different people ---> using various different operating Systems ---> potentially using various applications (Nuke, XSI, Max), which, in turn, have their own dedicated python interpreters.
The obvious option would be pip and easy_install.. These modules are clearly the "right" way to go, but its not really clear how a user would install/run them under the dedicated python installs that ship with Maya/Nuke/ etc...Though, it does seem possible (as explained here). Still Its going to be a pretty big barrier for a less-technical user.
Any help or points in the right direction would be immensely appreciated..
I would not say that pip/easy_install are the 'right' way for this problem. They are pretty good (not quite 'great') tools for motivated, technically inclined users -- but even in that context they have issues (such as unintended upgrades or deletions). Most importantly, they are opt-in methods: nobody can make you pip unless you want to. This means users can accidentally or deliberately get themselves into very different positions from each other, which makes support and maintenance a nightmare.
I've had very good luck in Maya distributing a zipped file containing a complete environment - all the modules etc. userSetup.py adds that zip to the path and the Python's native zipimport functionality handles the rest. This makes sure that there is only one file to maintain and distribute. It also fixes the common problem of leftover .pyc files creating havok after .py files get moved or renamed. Since this is all standard python, I'd assume this will work for any app-specific python that uses a 2.6+ version of python, though I've never tried it in Nuke or Max.
The main wrinkle will be modules with .pyd or other binary components, typically these don't work inside the zip files. I include a bootstrap routine which unpacks those to a (disposable) location on the user's disk and adds that to the path.
There's a detailed discussion of the method here and some background here
I'm working on an Inno Setup installer for a Python application for Windows 7, and I have these requirements:
The app shouldn't write anything to the installation directory
It should be able to use .pyc files
The app shouldn't require a specific Python version, so I can't just add a set of .pyc files to the installer
Is there a recommended way of handling this? Like give the user a way to (re)generate the .pyc files? Or is the shorter startup time benefit from the .pyc files usually not worth worrying about?
PYC files aren't guaranteed to be compatible for different python versions. If you don't know that all your customers are running the same python versions, you really don't want to distribute pyc's directly. So, you have to choose between distributing PYCs and supporting multiple python versions.
You could create build process that compiles all your files using py_compile and zips them up into a version-specific package. You can do this with setuptools.; however it will be awkward to do because you'll have to run py_compile in every version you need to support.
If you are basically distributing a closed application and don't want people to have trivial access to your source code, then py2exe is probably a simpler alternative. If your python is supposed to be integrated into the user's python install, then it's probably simpler to just create a zip of your .py files and add a one-line .py stub that imports the zipped package(s) using zipfile
if it makes you feel better, PYC doesn't provide much extra security and it doesn't really boost perf much either :)
If you haven't read PEP 3147, that will probably answer your questions.
I don't mean the solution described in that PEP and implemented as of Python 3.2. That's great if your "multiple Python versions" just means "3.2, 3.3, and probably future 3.x". Or even if it means "2.6+ and 3.1+, but I only really care about 3.2 and 3.3, so if I don't get the pyc speedups for other ones that's OK".
But when I asked your supported versions, you said, "2.7", which means you can't rely on PEP 3147 to solve your problems.
Fortunately, the PEP is full of discussion of earlier attempts to solve the problem, and the pitfalls of each, and there should be more than enough there to figure out what the options are and how to implement them.
The one problem is that the PEP is very linux-centric—mainly because it's primarily linux distros that tried to solve the problem in the past. (Apple also did so, but their solution was (a) pretty much working, and (b) tightly coupled with the whole Mac-specific "framework" thing, so they were mostly ignored…)
So, it largely leaves open the question of "Where should I put the .pyc files on Windows?"
The best choice is probably an app-specific directory under the user's local application data directory. See Known Folders if you can require Vista or later, CSIDL if you can't. Either way, you're looking for the FOLDERID_LocalAppData or CSIDL_LOCAL_APPDATA, which is:
The file system directory that serves as a data repository for local (nonroaming) applications. A typical path is C:\Documents and Settings\username\Local Settings\Application Data.
The point is that it's a place for applications to store data that's separate for each user (and inside that user's profile directory), and also for each machine the user's roaming profile might end up on, which means you can safely put stuff there and know that the user has the permissions to write there without UAC getting involved, and also know (as well as you ever can) that no other user or machine will interfere with what's there.
Within that directory, you create a directory for your program, and put whatever you want there, and as long as you picked a unique name (e.g., My Unique App Name or My Company Name\My App Name or a UUID), you're safe from accidental collision with other programs. (There used to be specific guidelines on this in MSDN, but I can no longer find them.)
So, how do you get to that directory?
The easiest way is to just use the env variable %LOCALAPPDATA%. If you need to deal with older Windows, you can use %USERPROFILE% and tack \Local Settings\Application Data onto the end, which is guaranteed to either be the same, or end up in the same place via junctions.
You can also use pywin32 or ctypes to access the native Windows APIs (since there are at least 3 different APIs for this and at least two ways to access those APIs, I don't want to give all possible ways to write this… but a quick google or SO search for "pywin32 SHGetFolderPath" or "ctypes SHGetKnownFolderPath" or whatever should give you what you need).
Or, there are multiple third-party modules to handle this. The first one both Google and PyPI turned up was winshell.
Re-reading the original question, there's a much simpler answer that probably fits your requirements.
I don't know much about Inno, but most installers give you a way to run an arbitrary command as a post-copy step.
So, you can just use python -m compileall to create the .pyc files for you at install time—while you've still got elevated privileges, so there's no problem with UAC.
In fact, if you look at pywin32, and various other Python packages that come as installer packages, they do exactly this. This is an idiomatic thing to do for installing libraries into the user's Python installation, so I don't see why it wouldn't be considered reasonable for installing an executable that uses the user's Python installation.
Of course if the user later decides to uninstall Python 2.6 and install 2.7, your .pyc files will be hosed… but from your description, it sounds like your entire program will be hosed anyway, and the recommended solution for the user would probably be to uninstall and reinstall anyway, right?
Is it possible to deploy python applications such that you don't release the source code and you don't have to be sure the customer has python installed?
I'm thinking maybe there is some installation process that can run a python app from just the .pyc files and a shared library containing the interpreter or something like that?
Basically I'm keen to get the development benefits of a language like Python - high productivity etc. but can't quite see how you could deploy it professionally to a customer where you don't know how there machine is set up and you definitely can't deliver the source.
How do professional software houses developing in python do it (or maybe the answer is that they don't) ?
You protect your source code legally, not technologically. Distributing py files really isn't a big deal. The only technological solution here is not to ship your program (which is really becoming more popular these days, as software is provided over the internet rather than fully installed locally more often.)
If you don't want the user to have to have Python installed but want to run Python programs, you'll have to bundle Python. Your resistance to doing so seems quite odd to me. Java programs have to either bundle or anticipate the JVM's presence. C programs have to either bundle or anticipate libc's presence (usually the latter), etc. There's nothing hacky about using what you need.
Professional Python desktop software bundles Python, either through something like py2exe/cx_Freeze/some in-house thing that does the same thing or through embedding Python (in which case Python comes along as a library rather than an executable). The former approach is usually a lot more powerful and robust.
Yes, it is possible to make installation packages. Look for py2exe, cx_freeze and others.
No, it is not possible to keep the source code completely safe. There are always ways to decompile.
Original source code can trivially be obtained from .pyc files if someone wants to do it. Code obfuscation would make it more difficult to do something with the code.
I am surprised no one mentioned this before now, but Cython seems like a viable solution to this problem. It will take your Python code and transpile it into CPython compatible C code. You also get a small speed boost (~25% last I checked) since it will be compiled to native machine code instead of just Python byte code. You still need to be sure the user has Python installed (either by making it a pre-requisite pushed off onto the user to deal with, or bundling it as part of the installer process). Also, you do need to have at least one small part of your application in pure Python: the hook into the main function.
So you would need something basic like this:
import cython_compiled_module
if __name__ == '__main__':
cython_compiled_module.main()
But this effectively leaks no implementation details. I think using Cython should meet the criteria in the question, but it also introduces the added complexity of compiling in C, which loses some of Python's easy cross-platform nature. Whether that is worth it or not is up to you.
As others stated, even the resulting compiled C code could be decompiled with a little effort, but it is likely much more close to the type of obfuscation you were initially hoping for.
Well, it depends what you want to do. If by "not releasing the source code" you mean "the customer should not be able to access the source code in any way", well, you're fighting a losing battle. Even programs written in C can be reverse engineered, after all. If you're afraid someone will steal from you, make them sign a contract and sue them if there's trouble.
But if you mean "the customer should not care about python files, and not be able to casually access them", you can use a solution like cx_Freeze to turn your Python application into an executable.
Build a web application in python. Then the world can use it via a browser with zero install.
If I write a python script, anyone can simply point an editor to it and read it. But for programming written in C, one would have to use decompilers and hex tables and such. Why is that? I mean I simply can't open up the Safari web browser and look at its code.
Note: The author disavows a deep expertise in this subject. Some assertions may be incorrect.
Python actually is compiled into bytecode, which is what gets run by the python interpreter. Whenever you use a Python module, Python will generate a .pyc file with a name corresponding to the module. This is the equivalent of the .o file that's generated when you compile a C file.
So if you want something to disassemble, the .pyc file would be it :)
The process that Python goes through when compiling a module is pretty similar to what gcc or another C compiler does with C source code. The major difference is that it happens transparently as part of execution of the file. It's also optional: when running a non-module, i.e. an end-user script, Python will just interpret the code rather than compiling it first.
So really your question is "Why are python programs distributed as source rather than as compiled modules?" Or, put another way, "Why are C applications distributed as compiled binaries rather than as source code?"
It used to be very common for C applications to be distributed as source code. This was back before operating systems and their various subentities (i.e. linux distributions) became more established. Some distros, for example gentoo, still distribute apps as source code. Apps which are a bit more cutting edge or obscure are still distributed as source code for all platforms they target.
The reason for this is compatibility, and dependencies. The reason you can run the precompiled binary Safari on a Mac, or Firefox on Ubuntu Linux, is because it's been specifically built for that operating system, architecture (e.g. x86_64), and set of libraries.
Unfortunately, compilation of a large app is pretty slow, and needs to be redone at least partially every time the app is updated. Thus the motivation for binary distributions.
So why not create a binary distribution of Python? For one thing, as Aaron mentions, modules would need to be recompiled for each new version of the Python bytecode. But this would be similar to rebuilding a C app to link with a newer version of a dynamic library — Python modules are analogous in this sense to C libraries.
The real reason is that Python compilation is very much quicker than C compilation. This is in part, I think, because of the dynamic nature of the language, and also because it's not as thorough of a compilation. This has its tradeoffs: in particular, Python apps run much more slowly than do their C counterparts, because Python has to interpret the compiled bytecode into instructions for the processor, whereas the C app already contains such instructions.
That all being said, there is a program called py2exe that will take a Python module and distribution and build a precompiled windows executable, including in it the logic of the module and its dependencies, including Python itself. I guess the point of this is to avoid having to coerce people into installing Python on their Windows system just to run your app. Under linux, or I think even OS/X, Python is usually already installed, so precompilation is not really necessary. Linux systems also have super-dandy package managers that will transparently install dependencies such as Python if they are not already installed.
Python is a script language, runs in a virtual machine through an interpeter.
C is a compiled language, the code compiled to binary code which the computer can run without all that extra stuff Python needs.
This is sorta a big topic. You should look into your local friendly Computer Science curriculum, you'll find a lot of great stuff on this subject there.
The short answer is the Python is an "interpreted" language, which means that it requires a machine language program (the python interpreter) to run the python program, adding a layer of indirection. C or C++ are different. They are compiled directly to machine code, which runs directly on your processor.
There is a lot of additional voodoo to be learned here, however. Technically Python is compiled to a bytecode, and modern interpreters do more and more "Just in Time" compilation, so the boundaries between compiled and interpreted code are getting fuzzier all the time.
In several comments you asked: "Is it then possible to compile python to an executable binary file and then simply distribute that?"
From a theoretical viewpoint, there's no question the answer is yes -- a Python program could be compiled to, and distributed as, fully compiled machine code.
From a practical viewpoint, it's open to a lot more question. There are a few things like Unladen Swallow, Psyco, Shed Skin, and PyPy that you might want to know about though.
Unladen Swallow is primarily an attempt at making Python run faster, but part of the plan to do so involves using LLVM for its back-end. LLVM can (among other things) produce native machine code output. The last couple of releases of Unladen Swallow have used LLVM for native code generation, but 1) the most recent update on the web site is from late 2009, and 2) the release notes for that version say: "The Unladen Swallow team does not recommend wide adoption of the 2009Q3 release."
Psyco works as a plug-in for Python that basically does JIT compilation, so even though it can speed up execution (quite a lot in some cases), it doesn't produce a machine-code executable you can distribute. In short, while it's sort of similar to what you want, it's not intended to do exactly what you've asked for.
Shed Skin Python-to-C++ produces C++ as its output, and you then compile the C++ and (potentially) distribute the result of that. Shedskin is currently at version 0.5 -- i.e., nobody's claiming that it's a finished, released product. On the other hand, development is ongoing, and each release does seem to include pretty substantial improvements.
PyPy is a Python implementation written in Python. Their intent is to allow code production to be "plugged in" without affecting the rest of the implementation -- but while they currently support 4 different code generation models, I don't believe any of them results in producing native machine code that runs directly on the hardware.
Bottom line: work has been done and is being done with the intent of doing what you asked about, but at least to my knowledge there's not really anything I could reasonably recommend as a finished product that you can really depend on to do the job right now. The primary emphasis is really on execution speed, not producing standalone executables.
Yes, you can - it's called disassembling, and allows you to look at the code of Safari perfectly well. The thing is, C, among other languages, compiles to native code, i.e. code that your CPU can "understand" and execute.
More or less obviously, the level of abstraction present in the instruction set of your CPU is much smaller than that of a high level language like Python. The CPU instructions are not concerned with "downloading that URI", but more "check if that bit is set in a hardware register".
So, in conclusion, the level of complexity present in a native application is much higher when looking at the machine code, so many people simply can't make any sense of what is going on there, it's hard to get the big picture. With experience and time at your hands, it is possible though - people do it all the time, reversing applications and all.
you can't open up and read the code that actually runs for python either. Try
import dis
def foo():
for i in range(100):
print i
print dis.dis(foo)
That will show you the (human readable) bytcode of the foo program. equivalently, you can save the file and import it from the interactive python interpreter. This will create a .pyc file with the same basename as the script. open that with a hex editor and you are looking at the actually python bytecode.
The reason for the difference is that python changes up it's byte code between releases so that you would either need to distribute a different version of a binary only release for each version of python. This would be a pain.
With C, it's compiled to native code and so the byte code is much more stable making binary only releases possible.
because C code is complied to object (machine) code and python code is compiled into an intermediate byte code. I am not sure if you are even referring to the byte code of python - you must be referring to the source file itself which is directly executable (hiding the byte code from you!). C needs to be compiled and linked.
Python scripts are parsed and converted to binary only when they're run - i.e., they're text files and you can read them with an editor.
C code is compiled and linked to an executable binary file before they can be run. Normally, only this executable binary file is distributed - hence you need a decompiler. You can always view the source code, if you've access to it.
Not all C programs require decompilers. There's lots of C code distributed in source form. And some Python programs do require decompilers, if distributed as bytecode (.pyc files).
But, to the extent that your assumptions are valid, it's because C is a compiled language while Python is an interpreted language.
Python scripts are analogous to a man looking at a to-do list written in English (or language he understands). The man has to do all the work, every time that list of things has to be done.
If the man, instead of doing the steps on his own each time, creates and programs a robot which can carry out those steps again and again (and probably faster than him), that robot is analogous to the C program.
The man in the python case is called the "interpreter" and in the C case is called the "compiler", and the C robot is called the compiled program/executable.
When you look at the python program source, you see the to-do list. In case of the robot, you see the gears, motors and batteries, etc, which look very different from the to-do list. If you could get hold of the C "to-do" list, it looks somewhat like the python code, just in a different language.
G-WAN executes ANSI C scripts on the fly -making it just like Python scripts.
This can be server-side scripts (using G-WAN as a Web server) or any general-purpose C program and you can link any existing library.
Oh, and G-WAN C scripts are much faster than Python, PHP or Java...