Why is the GHC test suite written in Python, not Haskell? - python

I noticed that GHC (a widely-used Haskell compiler) has a test suite written in Python, not in Haskell (as I would naively expect). What is the history of this? Are there particular advantages to writing the test suite in a different language?
edit: Per a suggestion in the comments, I asked this in /r/haskell. It has now generated three answers, which I've quoted below:
tathougies said:
The test suite driver seems to be written in Python. Python is a good high-level scripting language.
It's like asking 'why does GHC use Make instead of haskell'? Probably because make is better at running shell programs with external dependency resolution built-in.
The tests themselves seem to be written in Haskell, verifying certain properties of the compiler and catching regressions. If they fail, it looks like the python driver is informed, and then would report the error to the user.
phadej added:
FWIW GHC's built system is being rewritten to use shake: the Haskell library.
eacameron said:
I don't know. But GHC doesn't have the luxury of using Haskell the same way you and I do. It has to bootstrap using a previous version of itself and it wants to avoid dependencies. Python is a pretty light-weight requirement since most systems (except Windows) come with it built in.

The commit message introducing Python explains a lot of it:
Revamp the testsuite framework. The previous framework was an
experiment that got a little out of control - a whole new language
with an interpreter written in Haskell was rather heavyweight and left
us with a maintenance problem.
So the new test driver is written in Python. The downside is that you
need Python to run the testsuite, but we don't think that's too big a
problem since it only affects developers and Python installs pretty
easily onto everything these days.
Highlights:
790 lines of Python, vs. 5300 lines of Haskell + 720 lines of <strange made-up language>.
the framework supports running tests in various "ways", which should
catch more bugs. By default, each test is run in three ways:
normal, -O, and -O -fasm. Additionally, if profiling libraries
have been built, another way (-O -prof -auto-all) is added. I plan
to also add a 'GHCi' way.
Running tests multiple ways has already shown up some new bugs!
documentation is in the README file and is somewhat improved.
the framework is rather less GHC-specific, and could without much
difficulty be coaxed into using other compilers. Most of the
GHC-specificness is in a separate configuration file (config/ghc).
Things may need a while to settle down. Expect some unexpected
failures.

Related

How to not give away source code with command line application in Python? [duplicate]

I am developing a piece of software in Python that will be distributed to my employer's customers. My employer wants to limit the usage of the software with a time-restricted license file.
If we distribute the .py files or even .pyc files it will be easy to (decompile and) remove the code that checks the license file.
Another aspect is that my employer does not want the code to be read by our customers, fearing that the code may be stolen or at least the "novel ideas".
Is there a good way to handle this problem?
"Is there a good way to handle this problem?" No. Nothing can be protected against reverse engineering. Even the firmware on DVD machines has been reverse engineered and the AACS Encryption key exposed. And that's in spite of the DMCA making that a criminal offense.
Since no technical method can stop your customers from reading your code, you have to apply ordinary commercial methods.
Licenses. Contracts. Terms and Conditions. This still works even when people can read the code. Note that some of your Python-based components may require that you pay fees before you sell software using those components. Also, some open-source licenses prohibit you from concealing the source or origins of that component.
Offer significant value. If your stuff is so good -- at a price that is hard to refuse -- there's no incentive to waste time and money reverse engineering anything. Reverse engineering is expensive. Make your product slightly less expensive.
Offer upgrades and enhancements that make any reverse engineering a bad idea. When the next release breaks their reverse engineering, there's no point. This can be carried to absurd extremes, but you should offer new features that make the next release more valuable than reverse engineering.
Offer customization at rates so attractive that they'd rather pay you to build and support the enhancements.
Use a license key which expires. This is cruel, and will give you a bad reputation, but it certainly makes your software stop working.
Offer it as a web service. SaaS involves no downloads to customers.
Python, being a byte-code-compiled interpreted language, is very difficult to lock down. Even if you use a exe-packager like py2exe, the layout of the executable is well-known, and the Python byte-codes are well understood.
Usually in cases like this, you have to make a tradeoff. How important is it really to protect the code? Are there real secrets in there (such as a key for symmetric encryption of bank transfers), or are you just being paranoid? Choose the language that lets you develop the best product quickest, and be realistic about how valuable your novel ideas are.
If you decide you really need to enforce the license check securely, write it as a small C extension so that the license check code can be extra-hard (but not impossible!) to reverse engineer, and leave the bulk of your code in Python.
Python is not the tool you need
You must use the right tool to do the right thing, and Python was not designed to be obfuscated. It's the contrary; everything is open or easy to reveal or modify in Python because that's the language's philosophy.
If you want something you can't see through, look for another tool. This is not a bad thing, it is important that several different tools exist for different usages.
Obfuscation is really hard
Even compiled programs can be reverse-engineered so don't think that you can fully protect any code. You can analyze obfuscated PHP, break the flash encryption key, etc. Newer versions of Windows are cracked every time.
Having a legal requirement is a good way to go
You cannot prevent somebody from misusing your code, but you can easily discover if someone does. Therefore, it's just a casual legal issue.
Code protection is overrated
Nowadays, business models tend to go for selling services instead of products. You cannot copy a service, pirate nor steal it. Maybe it's time to consider to go with the flow...
Compile python and distribute binaries!
Sensible idea:
Use Cython, Nuitka, Shed Skin or something similar to compile python to C code, then distribute your app as python binary libraries (pyd) instead.
That way, no Python (byte) code is left and you've done any reasonable amount of obscurification anyone (i.e. your employer) could expect from regular Code, I think. (.NET or Java less safe than this case, as that bytecode is not obfuscated and can relatively easily be decompiled into reasonable source.)
Cython is getting more and more compatible with CPython, so I think it should work. (I'm actually considering this for our product.. We're already building some thirdparty libs as pyd/dlls, so shipping our own python code as binaries is not a overly big step for us.)
See This Blog Post (not by me) for a tutorial on how to do it. (thx #hithwen)
Crazy idea:
You could probably get Cython to store the C-files separately for each module, then just concatenate them all and build them with heavy inlining. That way, your Python module is pretty monolithic and difficult to chip at with common tools.
Beyond crazy:
You might be able to build a single executable if you can link to (and optimize with) the python runtime and all libraries (dlls) statically. That way, it'd sure be difficult to intercept calls to/from python and whatever framework libraries you use. This cannot be done if you're using LGPL code though.
I understand that you want your customers to use the power of python but do not want expose the source code.
Here are my suggestions:
(a) Write the critical pieces of the code as C or C++ libraries and then use SIP or swig to expose the C/C++ APIs to Python namespace.
(b) Use cython instead of Python
(c) In both (a) and (b), it should be possible to distribute the libraries as licensed binary with a Python interface.
Have you had a look at pyminifier? It does Minify, obfuscate, and compress Python code. The example code looks pretty nasty for casual reverse engineering.
$ pyminifier --nonlatin --replacement-length=50 /tmp/tumult.py
#!/usr/bin/env python3
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲמּ=ImportError
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱=print
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ巡=False
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ澨=object
try:
import demiurgic
except ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲמּ:
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱("Warning: You're not demiurgic. Actually, I think that's normal.")
try:
import mystificate
except ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲמּ:
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱("Warning: Dark voodoo may be unreliable.")
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲﺬ=ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ巡
class ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐦚(ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ澨):
def __init__(self,*args,**kwargs):
pass
def ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ클(self,dactyl):
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ퐐=demiurgic.palpitation(dactyl)
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𠛲=mystificate.dark_voodoo(ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ퐐)
return ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𠛲
def ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐠯(self,whatever):
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱(whatever)
if __name__=="__main__":
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱("Forming...")
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲﺃ=ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐦚("epicaricacy","perseverate")
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲﺃ.ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐠯("Codswallop")
# Created by pyminifier (https://github.com/liftoff/pyminifier)
Is your employer aware that he can "steal" back any ideas that other people get from your code? I mean, if they can read your work, so can you theirs. Maybe looking at how you can benefit from the situation would yield a better return of your investment than fearing how much you could lose.
[EDIT] Answer to Nick's comment:
Nothing gained and nothing lost. The customer has what he wants (and paid for it since he did the change himself). Since he doesn't release the change, it's as if it didn't happen for everyone else.
Now if the customer sells the software, they have to change the copyright notice (which is illegal, so you can sue and will win -> simple case).
If they don't change the copyright notice, the 2nd level customers will notice that the software comes from you original and wonder what is going on. Chances are that they will contact you and so you will learn about the reselling of your work.
Again we have two cases: The original customer sold only a few copies. That means they didn't make much money anyway, so why bother. Or they sold in volume. That means better chances for you to learn about what they do and do something about it.
But in the end, most companies try to comply to the law (once their reputation is ruined, it's much harder to do business). So they will not steal your work but work with you to improve it. So if you include the source (with a license that protects you from simple reselling), chances are that they will simply push back changes they made since that will make sure the change is in the next version and they don't have to maintain it. That's win-win: You get changes and they can make the change themselves if they really, desperately need it even if you're unwilling to include it in the official release.
Use Cython. It will compile your modules to high-performant C files, which can then be compiled to native binary libraries. This is basically un-reversable, compared to .pyc bytecode!
I've written a detailed article on how to set up Cython for a Python project, check it out:
Protecting Python Sources With Cython
Do not rely on obfuscation. As You have correctly concluded, it offers very limited protection.
UPDATE: Here is a link to paper which reverse engineered obfuscated python code in Dropbox. The approach - opcode remapping is a good barrier, but clearly it can be defeated.
Instead, as many posters have mentioned make it:
Not worth reverse engineering time (Your software is so good, it makes sense to pay)
Make them sign a contract and do a license audit if feasible.
Alternatively, as the kick-ass Python IDE WingIDE does: Give away the code. That's right, give the code away and have people come back for upgrades and support.
Shipping .pyc files has its problems - they are not compatible with any other python version than the python version they were created with, which means you must know which python version is running on the systems the product will run on. That's a very limiting factor.
In some circumstances, it may be possible to move (all, or at least a key part) of the software into a web service that your organization hosts.
That way, the license checks can be performed in the safety of your own server room.
Though there's no perfect solution, the following can be done:
Move some critical piece of startup code into a native library.
Enforce the license check in the native library.
If the call to the native code were to be removed, the program wouldn't start anyway. If it's not removed then the license will be enforced.
Though this is not a cross-platform or a pure-Python solution, it will work.
I was surprised in not seeing pyconcrete in any answer. Maybe because it's newer than the question?
It could be exactly what you need(ed).
Instead of obfuscating the code, it encrypts it and decrypts at load time.
From pypi page:
Protect python script work flow
your_script.py import pyconcrete
pyconcrete will hook import module
when your script do import MODULE,
pyconcrete import hook will try to find MODULE.pye first and then
decrypt MODULE.pye via _pyconcrete.pyd and execute decrypted data (as
.pyc content)
encrypt & decrypt secret key record in _pyconcrete.pyd
(like DLL or SO) the secret key would be hide in binary code, can’t
see it directly in HEX view
The reliable only way to protect code is to run it on a server you control and provide your clients with a client which interfaces with that server.
I think there is one more method to protect your Python code; part of the Obfuscation method. I believe there was a game like Mount and Blade or something that changed and recompiled their own python interpreter (the original interpreter which i believe is open source) and just changed the OP codes in the OP code table to be different then the standard python OP codes.
So the python source is unmodified but the file extensions of the *.pyc files are different and the op codes don't match to the public python.exe interpreter. If you checked the games data files all the data was in Python source format.
All sorts of nasty tricks can be done to mess with immature hackers this way. Stopping a bunch of inexperienced hackers is easy. It's the professional hackers that you will not likely beat. But most companies don't keep pro hackers on staff long I imagine (likely because things get hacked). But immature hackers are all over the place (read as curious IT staff).
You could for example, in a modified interpreter, allow it to check for certain comments or doc strings in your source. You could have special OP codes for such lines of code. For example:
OP 234 is for source line "# Copyright I wrote this"
or compile that line into op codes that are equivalent to "if False:" if "# Copyright" is missing. Basically disabling a whole block of code for what appears to be some obscure reason.
One use case where recompiling a modified interpreter may be feasible is where you didn't write the app, the app is big, but you are paid to protect it, such as when you're a dedicated server admin for a financial app.
I find it a little contradictory to leave the source or opcodes open for eyeballs, but use SSL for network traffic. SSL is not 100% safe either. But it's used to stop MOST eyes from reading it. A wee bit precaution is sensible.
Also, if enough people deem that Python source and opcodes are too visible, it's likely someone will eventually develop at least a simple protection tool for it. So the more people asking "how to protect Python app" only promotes that development.
Depending in who the client is, a simple protection mechanism, combined with a sensible license agreement will be far more effective than any complex licensing/encryption/obfuscation system.
The best solution would be selling the code as a service, say by hosting the service, or offering support - although that isn't always practical.
Shipping the code as .pyc files will prevent your protection being foiled by a few #s, but it's hardly effective anti-piracy protection (as if there is such a technology), and at the end of the day, it shouldn't achieve anything that a decent license agreement with the company will.
Concentrate on making your code as nice to use as possible - having happy customers will make your company far more money than preventing some theoretical piracy..
Another attempt to make your code harder to steal is to use jython and then use java obfuscator.
This should work pretty well as jythonc translate python code to java and then java is compiled to bytecode. So ounce you obfuscate the classes it will be really hard to understand what is going on after decompilation, not to mention recovering the actual code.
The only problem with jython is that you can't use python modules written in c.
You should take a look at how the guys at getdropbox.com do it for their client software, including Linux. It's quite tricky to crack and requires some quite creative disassembly to get past the protection mechanisms.
The best you can do with Python is to obscure things.
Strip out all docstrings
Distribute only the .pyc compiled files.
freeze it
Obscure your constants inside a class/module so that help(config) doesn't show everything
You may be able to add some additional obscurity by encrypting part of it and decrypting it on the fly and passing it to eval(). But no matter what you do someone can break it.
None of this will stop a determined attacker from disassembling the bytecode or digging through your api with help, dir, etc.
What about signing your code with standard encryption schemes by hashing and signing important files and checking it with public key methods?
In this way you can issue license file with a public key for each customer.
Additional you can use an python obfuscator like this one (just googled it).
Idea of having time restricted license and check for it in locally installed program will not work. Even with perfect obfuscation, license check can be removed. However if you check license on remote system and run significant part of the program on your closed remote system, you will be able to protect your IP.
Preventing competitors from using the source code as their own or write their inspired version of the same code, one way to protect is to add signatures to your program logic (some secrets to be able to prove that code was stolen from you) and obfuscate the python source code so, it's hard to read and utilize.
Good obfuscation adds basically the same protection to your code, that compiling it to executable (and stripping binary) does. Figuring out how obfuscated complex code works might be even harder than actually writing your own implementation.
This will not help preventing hacking of your program. Even with obfuscation code license stuff will be cracked and program may be modified to have slightly different behaviour (in the same way that compiling code to binary does not help protection of native programs).
In addition to symbol obfuscation might be good idea to unrefactor the code, which makes everything even more confusing if e.g. call graphs points to many different places even if actually those different places does eventually the same thing.
Logical signature inside obfuscated code (e.g. you may create table of values which are used by program logic, but also used as signature), which can be used to determine that code is originated from you. If someone decides to use your obfuscated code module as part of their own product (even after reobfuscating it to make it seem different) you can show, that code is stolen with your secret signature.
I have looked at software protection in general for my own projects and the general philosophy is that complete protection is impossible. The only thing that you can hope to achieve is to add protection to a level that would cost your customer more to bypass than it would to purchase another license.
With that said I was just checking google for python obsfucation and not turning up a lot of anything. In a .Net solution, obsfucation would be a first approach to your problem on a windows platform, but I am not sure if anyone has solutions on Linux that work with Mono.
The next thing would be to write your code in a compiled language, or if you really want to go all the way, then in assembler. A stripped out executable would be a lot harder to decompile than an interpreted language.
It all comes down to tradeoffs. On one end you have ease of software development in python, in which it is also very hard to hide secrets. On the other end you have software written in assembler which is much harder to write, but is much easier to hide secrets.
Your boss has to choose a point somewhere along that continuum that supports his requirements. And then he has to give you the tools and time so you can build what he wants. However my bet is that he will object to real development costs versus potential monetary losses.
Neiher Cython nor Nuitka were not the answer, because when running the solution that is compiled with Nuitka or Cython into .pyd or .exe files a cache directory is generated and all .pyc files are copied into the cache directory, so an attacker simply can decompile .pyc files and see your code or change it.
It is possible to have the py2exe byte-code in a crypted resource for a C launcher that loads and executes it in memory. Some ideas here and here.
Some have also thought of a self modifying program to make reverse engineering expensive.
You can also find tutorials for preventing debuggers, make the disassembler fail, set false debugger breakpoints and protect your code with checksums. Search for ["crypted code" execute "in memory"] for more links.
But as others already said, if your code is worth it, reverse engineers will succeed in the end.
Use the same way to protect binary file of c/c++, that is, obfuscate each function body in executable or library binary file, insert an instruction "jump" at the begin of each function entry, jump to special function to restore obfuscated code. Byte-code is binary code of Python script, so
First compile python script to code object
Then iterate each code object, obfuscate co_code of each code object as the following
0 JUMP_ABSOLUTE n = 3 + len(bytecode)
3
...
... Here it's obfuscated bytecode
...
n LOAD_GLOBAL ? (__pyarmor__)
n+3 CALL_FUNCTION 0
n+6 POP_TOP
n+7 JUMP_ABSOLUTE 0
Save obfuscated code object as .pyc or .pyo file
Those obfuscated file (.pyc or .pyo) can be used by normal python interpreter, when those code object is called first time
First op is JUMP_ABSOLUTE, it will jump to offset n
At offset n, the instruction is to call a PyCFunction. This function will restore those obfuscated bytecode between offset 3 and n, and put the original byte-code at offset 0. The obfuscated code can be got by the following code
char *obfucated_bytecode;
Py_ssize_t len;
PyFrameObject* frame = PyEval_GetFrame();
PyCodeObject *f_code = frame->f_code;
PyObject *co_code = f_code->co_code;
PyBytes_AsStringAndSize(co_code, &obfucated_bytecode, &len)
After this function returns, the last instruction is to jump to
offset 0. The really byte-code now is executed.
There is a tool Pyarmor to obfuscate python scripts by this way.
There is a comprehensive answer on concealing the python source code, which can be find here.
Possible techniques discussed are:
- use compiled bytecode (python -m compileall)
- executable creators (or installers like PyInstaller)
- software as an service (the best solution to conceal your code in my opinion)
- python source code obfuscators
using cxfreeze ( py2exe for linux ) will do the job.
http://cx-freeze.sourceforge.net/
it is available in ubuntu repositories
If we focus on software licensing, I would recommend to take a look at another Stack Overflow answer I wrote here to get some inspiration of how a license key verification system can be constructed.
There is an open-source library on GitHub that can help you with the license verification bit.
You can install it by pip install licensing and then add the following code:
pubKey = "<RSAKeyValue><Modulus>sGbvxwdlDbqFXOMlVUnAF5ew0t0WpPW7rFpI5jHQOFkht/326dvh7t74RYeMpjy357NljouhpTLA3a6idnn4j6c3jmPWBkjZndGsPL4Bqm+fwE48nKpGPjkj4q/yzT4tHXBTyvaBjA8bVoCTnu+LiC4XEaLZRThGzIn5KQXKCigg6tQRy0GXE13XYFVz/x1mjFbT9/7dS8p85n8BuwlY5JvuBIQkKhuCNFfrUxBWyu87CFnXWjIupCD2VO/GbxaCvzrRjLZjAngLCMtZbYBALksqGPgTUN7ZM24XbPWyLtKPaXF2i4XRR9u6eTj5BfnLbKAU5PIVfjIS+vNYYogteQ==</Modulus><Exponent>AQAB</Exponent></RSAKeyValue>"
res = Key.activate(token="WyIyNTU1IiwiRjdZZTB4RmtuTVcrQlNqcSszbmFMMHB3aWFJTlBsWW1Mbm9raVFyRyJd",\
rsa_pub_key=pubKey,\
product_id=3349, key="ICVLD-VVSZR-ZTICT-YKGXL", machine_code=Helpers.GetMachineCode())
if res[0] == None not Helpers.IsOnRightMachine(res[0]):
print("An error occured: {0}".format(res[1]))
else:
print("Success")
You can read more about the way the RSA public key, etc are configured here.
I documented how to obfuscate the python by converting it to .so file, and converting it to a python wheel file:
https://github.com/UM-NLP/python-obfuscation

How to create exe file from Python that is impossible to decode back to python? [duplicate]

I am developing a piece of software in Python that will be distributed to my employer's customers. My employer wants to limit the usage of the software with a time-restricted license file.
If we distribute the .py files or even .pyc files it will be easy to (decompile and) remove the code that checks the license file.
Another aspect is that my employer does not want the code to be read by our customers, fearing that the code may be stolen or at least the "novel ideas".
Is there a good way to handle this problem?
"Is there a good way to handle this problem?" No. Nothing can be protected against reverse engineering. Even the firmware on DVD machines has been reverse engineered and the AACS Encryption key exposed. And that's in spite of the DMCA making that a criminal offense.
Since no technical method can stop your customers from reading your code, you have to apply ordinary commercial methods.
Licenses. Contracts. Terms and Conditions. This still works even when people can read the code. Note that some of your Python-based components may require that you pay fees before you sell software using those components. Also, some open-source licenses prohibit you from concealing the source or origins of that component.
Offer significant value. If your stuff is so good -- at a price that is hard to refuse -- there's no incentive to waste time and money reverse engineering anything. Reverse engineering is expensive. Make your product slightly less expensive.
Offer upgrades and enhancements that make any reverse engineering a bad idea. When the next release breaks their reverse engineering, there's no point. This can be carried to absurd extremes, but you should offer new features that make the next release more valuable than reverse engineering.
Offer customization at rates so attractive that they'd rather pay you to build and support the enhancements.
Use a license key which expires. This is cruel, and will give you a bad reputation, but it certainly makes your software stop working.
Offer it as a web service. SaaS involves no downloads to customers.
Python, being a byte-code-compiled interpreted language, is very difficult to lock down. Even if you use a exe-packager like py2exe, the layout of the executable is well-known, and the Python byte-codes are well understood.
Usually in cases like this, you have to make a tradeoff. How important is it really to protect the code? Are there real secrets in there (such as a key for symmetric encryption of bank transfers), or are you just being paranoid? Choose the language that lets you develop the best product quickest, and be realistic about how valuable your novel ideas are.
If you decide you really need to enforce the license check securely, write it as a small C extension so that the license check code can be extra-hard (but not impossible!) to reverse engineer, and leave the bulk of your code in Python.
Python is not the tool you need
You must use the right tool to do the right thing, and Python was not designed to be obfuscated. It's the contrary; everything is open or easy to reveal or modify in Python because that's the language's philosophy.
If you want something you can't see through, look for another tool. This is not a bad thing, it is important that several different tools exist for different usages.
Obfuscation is really hard
Even compiled programs can be reverse-engineered so don't think that you can fully protect any code. You can analyze obfuscated PHP, break the flash encryption key, etc. Newer versions of Windows are cracked every time.
Having a legal requirement is a good way to go
You cannot prevent somebody from misusing your code, but you can easily discover if someone does. Therefore, it's just a casual legal issue.
Code protection is overrated
Nowadays, business models tend to go for selling services instead of products. You cannot copy a service, pirate nor steal it. Maybe it's time to consider to go with the flow...
Compile python and distribute binaries!
Sensible idea:
Use Cython, Nuitka, Shed Skin or something similar to compile python to C code, then distribute your app as python binary libraries (pyd) instead.
That way, no Python (byte) code is left and you've done any reasonable amount of obscurification anyone (i.e. your employer) could expect from regular Code, I think. (.NET or Java less safe than this case, as that bytecode is not obfuscated and can relatively easily be decompiled into reasonable source.)
Cython is getting more and more compatible with CPython, so I think it should work. (I'm actually considering this for our product.. We're already building some thirdparty libs as pyd/dlls, so shipping our own python code as binaries is not a overly big step for us.)
See This Blog Post (not by me) for a tutorial on how to do it. (thx #hithwen)
Crazy idea:
You could probably get Cython to store the C-files separately for each module, then just concatenate them all and build them with heavy inlining. That way, your Python module is pretty monolithic and difficult to chip at with common tools.
Beyond crazy:
You might be able to build a single executable if you can link to (and optimize with) the python runtime and all libraries (dlls) statically. That way, it'd sure be difficult to intercept calls to/from python and whatever framework libraries you use. This cannot be done if you're using LGPL code though.
I understand that you want your customers to use the power of python but do not want expose the source code.
Here are my suggestions:
(a) Write the critical pieces of the code as C or C++ libraries and then use SIP or swig to expose the C/C++ APIs to Python namespace.
(b) Use cython instead of Python
(c) In both (a) and (b), it should be possible to distribute the libraries as licensed binary with a Python interface.
Have you had a look at pyminifier? It does Minify, obfuscate, and compress Python code. The example code looks pretty nasty for casual reverse engineering.
$ pyminifier --nonlatin --replacement-length=50 /tmp/tumult.py
#!/usr/bin/env python3
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲמּ=ImportError
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱=print
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ巡=False
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ澨=object
try:
import demiurgic
except ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲמּ:
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱("Warning: You're not demiurgic. Actually, I think that's normal.")
try:
import mystificate
except ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲמּ:
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱("Warning: Dark voodoo may be unreliable.")
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲﺬ=ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ巡
class ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐦚(ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ澨):
def __init__(self,*args,**kwargs):
pass
def ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ클(self,dactyl):
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ퐐=demiurgic.palpitation(dactyl)
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𠛲=mystificate.dark_voodoo(ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ퐐)
return ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𠛲
def ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐠯(self,whatever):
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱(whatever)
if __name__=="__main__":
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱("Forming...")
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲﺃ=ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐦚("epicaricacy","perseverate")
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲﺃ.ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐠯("Codswallop")
# Created by pyminifier (https://github.com/liftoff/pyminifier)
Is your employer aware that he can "steal" back any ideas that other people get from your code? I mean, if they can read your work, so can you theirs. Maybe looking at how you can benefit from the situation would yield a better return of your investment than fearing how much you could lose.
[EDIT] Answer to Nick's comment:
Nothing gained and nothing lost. The customer has what he wants (and paid for it since he did the change himself). Since he doesn't release the change, it's as if it didn't happen for everyone else.
Now if the customer sells the software, they have to change the copyright notice (which is illegal, so you can sue and will win -> simple case).
If they don't change the copyright notice, the 2nd level customers will notice that the software comes from you original and wonder what is going on. Chances are that they will contact you and so you will learn about the reselling of your work.
Again we have two cases: The original customer sold only a few copies. That means they didn't make much money anyway, so why bother. Or they sold in volume. That means better chances for you to learn about what they do and do something about it.
But in the end, most companies try to comply to the law (once their reputation is ruined, it's much harder to do business). So they will not steal your work but work with you to improve it. So if you include the source (with a license that protects you from simple reselling), chances are that they will simply push back changes they made since that will make sure the change is in the next version and they don't have to maintain it. That's win-win: You get changes and they can make the change themselves if they really, desperately need it even if you're unwilling to include it in the official release.
Use Cython. It will compile your modules to high-performant C files, which can then be compiled to native binary libraries. This is basically un-reversable, compared to .pyc bytecode!
I've written a detailed article on how to set up Cython for a Python project, check it out:
Protecting Python Sources With Cython
Do not rely on obfuscation. As You have correctly concluded, it offers very limited protection.
UPDATE: Here is a link to paper which reverse engineered obfuscated python code in Dropbox. The approach - opcode remapping is a good barrier, but clearly it can be defeated.
Instead, as many posters have mentioned make it:
Not worth reverse engineering time (Your software is so good, it makes sense to pay)
Make them sign a contract and do a license audit if feasible.
Alternatively, as the kick-ass Python IDE WingIDE does: Give away the code. That's right, give the code away and have people come back for upgrades and support.
Shipping .pyc files has its problems - they are not compatible with any other python version than the python version they were created with, which means you must know which python version is running on the systems the product will run on. That's a very limiting factor.
In some circumstances, it may be possible to move (all, or at least a key part) of the software into a web service that your organization hosts.
That way, the license checks can be performed in the safety of your own server room.
Though there's no perfect solution, the following can be done:
Move some critical piece of startup code into a native library.
Enforce the license check in the native library.
If the call to the native code were to be removed, the program wouldn't start anyway. If it's not removed then the license will be enforced.
Though this is not a cross-platform or a pure-Python solution, it will work.
I was surprised in not seeing pyconcrete in any answer. Maybe because it's newer than the question?
It could be exactly what you need(ed).
Instead of obfuscating the code, it encrypts it and decrypts at load time.
From pypi page:
Protect python script work flow
your_script.py import pyconcrete
pyconcrete will hook import module
when your script do import MODULE,
pyconcrete import hook will try to find MODULE.pye first and then
decrypt MODULE.pye via _pyconcrete.pyd and execute decrypted data (as
.pyc content)
encrypt & decrypt secret key record in _pyconcrete.pyd
(like DLL or SO) the secret key would be hide in binary code, can’t
see it directly in HEX view
The reliable only way to protect code is to run it on a server you control and provide your clients with a client which interfaces with that server.
I think there is one more method to protect your Python code; part of the Obfuscation method. I believe there was a game like Mount and Blade or something that changed and recompiled their own python interpreter (the original interpreter which i believe is open source) and just changed the OP codes in the OP code table to be different then the standard python OP codes.
So the python source is unmodified but the file extensions of the *.pyc files are different and the op codes don't match to the public python.exe interpreter. If you checked the games data files all the data was in Python source format.
All sorts of nasty tricks can be done to mess with immature hackers this way. Stopping a bunch of inexperienced hackers is easy. It's the professional hackers that you will not likely beat. But most companies don't keep pro hackers on staff long I imagine (likely because things get hacked). But immature hackers are all over the place (read as curious IT staff).
You could for example, in a modified interpreter, allow it to check for certain comments or doc strings in your source. You could have special OP codes for such lines of code. For example:
OP 234 is for source line "# Copyright I wrote this"
or compile that line into op codes that are equivalent to "if False:" if "# Copyright" is missing. Basically disabling a whole block of code for what appears to be some obscure reason.
One use case where recompiling a modified interpreter may be feasible is where you didn't write the app, the app is big, but you are paid to protect it, such as when you're a dedicated server admin for a financial app.
I find it a little contradictory to leave the source or opcodes open for eyeballs, but use SSL for network traffic. SSL is not 100% safe either. But it's used to stop MOST eyes from reading it. A wee bit precaution is sensible.
Also, if enough people deem that Python source and opcodes are too visible, it's likely someone will eventually develop at least a simple protection tool for it. So the more people asking "how to protect Python app" only promotes that development.
Depending in who the client is, a simple protection mechanism, combined with a sensible license agreement will be far more effective than any complex licensing/encryption/obfuscation system.
The best solution would be selling the code as a service, say by hosting the service, or offering support - although that isn't always practical.
Shipping the code as .pyc files will prevent your protection being foiled by a few #s, but it's hardly effective anti-piracy protection (as if there is such a technology), and at the end of the day, it shouldn't achieve anything that a decent license agreement with the company will.
Concentrate on making your code as nice to use as possible - having happy customers will make your company far more money than preventing some theoretical piracy..
Another attempt to make your code harder to steal is to use jython and then use java obfuscator.
This should work pretty well as jythonc translate python code to java and then java is compiled to bytecode. So ounce you obfuscate the classes it will be really hard to understand what is going on after decompilation, not to mention recovering the actual code.
The only problem with jython is that you can't use python modules written in c.
You should take a look at how the guys at getdropbox.com do it for their client software, including Linux. It's quite tricky to crack and requires some quite creative disassembly to get past the protection mechanisms.
The best you can do with Python is to obscure things.
Strip out all docstrings
Distribute only the .pyc compiled files.
freeze it
Obscure your constants inside a class/module so that help(config) doesn't show everything
You may be able to add some additional obscurity by encrypting part of it and decrypting it on the fly and passing it to eval(). But no matter what you do someone can break it.
None of this will stop a determined attacker from disassembling the bytecode or digging through your api with help, dir, etc.
What about signing your code with standard encryption schemes by hashing and signing important files and checking it with public key methods?
In this way you can issue license file with a public key for each customer.
Additional you can use an python obfuscator like this one (just googled it).
Idea of having time restricted license and check for it in locally installed program will not work. Even with perfect obfuscation, license check can be removed. However if you check license on remote system and run significant part of the program on your closed remote system, you will be able to protect your IP.
Preventing competitors from using the source code as their own or write their inspired version of the same code, one way to protect is to add signatures to your program logic (some secrets to be able to prove that code was stolen from you) and obfuscate the python source code so, it's hard to read and utilize.
Good obfuscation adds basically the same protection to your code, that compiling it to executable (and stripping binary) does. Figuring out how obfuscated complex code works might be even harder than actually writing your own implementation.
This will not help preventing hacking of your program. Even with obfuscation code license stuff will be cracked and program may be modified to have slightly different behaviour (in the same way that compiling code to binary does not help protection of native programs).
In addition to symbol obfuscation might be good idea to unrefactor the code, which makes everything even more confusing if e.g. call graphs points to many different places even if actually those different places does eventually the same thing.
Logical signature inside obfuscated code (e.g. you may create table of values which are used by program logic, but also used as signature), which can be used to determine that code is originated from you. If someone decides to use your obfuscated code module as part of their own product (even after reobfuscating it to make it seem different) you can show, that code is stolen with your secret signature.
I have looked at software protection in general for my own projects and the general philosophy is that complete protection is impossible. The only thing that you can hope to achieve is to add protection to a level that would cost your customer more to bypass than it would to purchase another license.
With that said I was just checking google for python obsfucation and not turning up a lot of anything. In a .Net solution, obsfucation would be a first approach to your problem on a windows platform, but I am not sure if anyone has solutions on Linux that work with Mono.
The next thing would be to write your code in a compiled language, or if you really want to go all the way, then in assembler. A stripped out executable would be a lot harder to decompile than an interpreted language.
It all comes down to tradeoffs. On one end you have ease of software development in python, in which it is also very hard to hide secrets. On the other end you have software written in assembler which is much harder to write, but is much easier to hide secrets.
Your boss has to choose a point somewhere along that continuum that supports his requirements. And then he has to give you the tools and time so you can build what he wants. However my bet is that he will object to real development costs versus potential monetary losses.
Neiher Cython nor Nuitka were not the answer, because when running the solution that is compiled with Nuitka or Cython into .pyd or .exe files a cache directory is generated and all .pyc files are copied into the cache directory, so an attacker simply can decompile .pyc files and see your code or change it.
It is possible to have the py2exe byte-code in a crypted resource for a C launcher that loads and executes it in memory. Some ideas here and here.
Some have also thought of a self modifying program to make reverse engineering expensive.
You can also find tutorials for preventing debuggers, make the disassembler fail, set false debugger breakpoints and protect your code with checksums. Search for ["crypted code" execute "in memory"] for more links.
But as others already said, if your code is worth it, reverse engineers will succeed in the end.
Use the same way to protect binary file of c/c++, that is, obfuscate each function body in executable or library binary file, insert an instruction "jump" at the begin of each function entry, jump to special function to restore obfuscated code. Byte-code is binary code of Python script, so
First compile python script to code object
Then iterate each code object, obfuscate co_code of each code object as the following
0 JUMP_ABSOLUTE n = 3 + len(bytecode)
3
...
... Here it's obfuscated bytecode
...
n LOAD_GLOBAL ? (__pyarmor__)
n+3 CALL_FUNCTION 0
n+6 POP_TOP
n+7 JUMP_ABSOLUTE 0
Save obfuscated code object as .pyc or .pyo file
Those obfuscated file (.pyc or .pyo) can be used by normal python interpreter, when those code object is called first time
First op is JUMP_ABSOLUTE, it will jump to offset n
At offset n, the instruction is to call a PyCFunction. This function will restore those obfuscated bytecode between offset 3 and n, and put the original byte-code at offset 0. The obfuscated code can be got by the following code
char *obfucated_bytecode;
Py_ssize_t len;
PyFrameObject* frame = PyEval_GetFrame();
PyCodeObject *f_code = frame->f_code;
PyObject *co_code = f_code->co_code;
PyBytes_AsStringAndSize(co_code, &obfucated_bytecode, &len)
After this function returns, the last instruction is to jump to
offset 0. The really byte-code now is executed.
There is a tool Pyarmor to obfuscate python scripts by this way.
There is a comprehensive answer on concealing the python source code, which can be find here.
Possible techniques discussed are:
- use compiled bytecode (python -m compileall)
- executable creators (or installers like PyInstaller)
- software as an service (the best solution to conceal your code in my opinion)
- python source code obfuscators
using cxfreeze ( py2exe for linux ) will do the job.
http://cx-freeze.sourceforge.net/
it is available in ubuntu repositories
If we focus on software licensing, I would recommend to take a look at another Stack Overflow answer I wrote here to get some inspiration of how a license key verification system can be constructed.
There is an open-source library on GitHub that can help you with the license verification bit.
You can install it by pip install licensing and then add the following code:
pubKey = "<RSAKeyValue><Modulus>sGbvxwdlDbqFXOMlVUnAF5ew0t0WpPW7rFpI5jHQOFkht/326dvh7t74RYeMpjy357NljouhpTLA3a6idnn4j6c3jmPWBkjZndGsPL4Bqm+fwE48nKpGPjkj4q/yzT4tHXBTyvaBjA8bVoCTnu+LiC4XEaLZRThGzIn5KQXKCigg6tQRy0GXE13XYFVz/x1mjFbT9/7dS8p85n8BuwlY5JvuBIQkKhuCNFfrUxBWyu87CFnXWjIupCD2VO/GbxaCvzrRjLZjAngLCMtZbYBALksqGPgTUN7ZM24XbPWyLtKPaXF2i4XRR9u6eTj5BfnLbKAU5PIVfjIS+vNYYogteQ==</Modulus><Exponent>AQAB</Exponent></RSAKeyValue>"
res = Key.activate(token="WyIyNTU1IiwiRjdZZTB4RmtuTVcrQlNqcSszbmFMMHB3aWFJTlBsWW1Mbm9raVFyRyJd",\
rsa_pub_key=pubKey,\
product_id=3349, key="ICVLD-VVSZR-ZTICT-YKGXL", machine_code=Helpers.GetMachineCode())
if res[0] == None not Helpers.IsOnRightMachine(res[0]):
print("An error occured: {0}".format(res[1]))
else:
print("Success")
You can read more about the way the RSA public key, etc are configured here.
I documented how to obfuscate the python by converting it to .so file, and converting it to a python wheel file:
https://github.com/UM-NLP/python-obfuscation

Encrypting the source code of a python project [duplicate]

I am developing a piece of software in Python that will be distributed to my employer's customers. My employer wants to limit the usage of the software with a time-restricted license file.
If we distribute the .py files or even .pyc files it will be easy to (decompile and) remove the code that checks the license file.
Another aspect is that my employer does not want the code to be read by our customers, fearing that the code may be stolen or at least the "novel ideas".
Is there a good way to handle this problem?
"Is there a good way to handle this problem?" No. Nothing can be protected against reverse engineering. Even the firmware on DVD machines has been reverse engineered and the AACS Encryption key exposed. And that's in spite of the DMCA making that a criminal offense.
Since no technical method can stop your customers from reading your code, you have to apply ordinary commercial methods.
Licenses. Contracts. Terms and Conditions. This still works even when people can read the code. Note that some of your Python-based components may require that you pay fees before you sell software using those components. Also, some open-source licenses prohibit you from concealing the source or origins of that component.
Offer significant value. If your stuff is so good -- at a price that is hard to refuse -- there's no incentive to waste time and money reverse engineering anything. Reverse engineering is expensive. Make your product slightly less expensive.
Offer upgrades and enhancements that make any reverse engineering a bad idea. When the next release breaks their reverse engineering, there's no point. This can be carried to absurd extremes, but you should offer new features that make the next release more valuable than reverse engineering.
Offer customization at rates so attractive that they'd rather pay you to build and support the enhancements.
Use a license key which expires. This is cruel, and will give you a bad reputation, but it certainly makes your software stop working.
Offer it as a web service. SaaS involves no downloads to customers.
Python, being a byte-code-compiled interpreted language, is very difficult to lock down. Even if you use a exe-packager like py2exe, the layout of the executable is well-known, and the Python byte-codes are well understood.
Usually in cases like this, you have to make a tradeoff. How important is it really to protect the code? Are there real secrets in there (such as a key for symmetric encryption of bank transfers), or are you just being paranoid? Choose the language that lets you develop the best product quickest, and be realistic about how valuable your novel ideas are.
If you decide you really need to enforce the license check securely, write it as a small C extension so that the license check code can be extra-hard (but not impossible!) to reverse engineer, and leave the bulk of your code in Python.
Python is not the tool you need
You must use the right tool to do the right thing, and Python was not designed to be obfuscated. It's the contrary; everything is open or easy to reveal or modify in Python because that's the language's philosophy.
If you want something you can't see through, look for another tool. This is not a bad thing, it is important that several different tools exist for different usages.
Obfuscation is really hard
Even compiled programs can be reverse-engineered so don't think that you can fully protect any code. You can analyze obfuscated PHP, break the flash encryption key, etc. Newer versions of Windows are cracked every time.
Having a legal requirement is a good way to go
You cannot prevent somebody from misusing your code, but you can easily discover if someone does. Therefore, it's just a casual legal issue.
Code protection is overrated
Nowadays, business models tend to go for selling services instead of products. You cannot copy a service, pirate nor steal it. Maybe it's time to consider to go with the flow...
Compile python and distribute binaries!
Sensible idea:
Use Cython, Nuitka, Shed Skin or something similar to compile python to C code, then distribute your app as python binary libraries (pyd) instead.
That way, no Python (byte) code is left and you've done any reasonable amount of obscurification anyone (i.e. your employer) could expect from regular Code, I think. (.NET or Java less safe than this case, as that bytecode is not obfuscated and can relatively easily be decompiled into reasonable source.)
Cython is getting more and more compatible with CPython, so I think it should work. (I'm actually considering this for our product.. We're already building some thirdparty libs as pyd/dlls, so shipping our own python code as binaries is not a overly big step for us.)
See This Blog Post (not by me) for a tutorial on how to do it. (thx #hithwen)
Crazy idea:
You could probably get Cython to store the C-files separately for each module, then just concatenate them all and build them with heavy inlining. That way, your Python module is pretty monolithic and difficult to chip at with common tools.
Beyond crazy:
You might be able to build a single executable if you can link to (and optimize with) the python runtime and all libraries (dlls) statically. That way, it'd sure be difficult to intercept calls to/from python and whatever framework libraries you use. This cannot be done if you're using LGPL code though.
I understand that you want your customers to use the power of python but do not want expose the source code.
Here are my suggestions:
(a) Write the critical pieces of the code as C or C++ libraries and then use SIP or swig to expose the C/C++ APIs to Python namespace.
(b) Use cython instead of Python
(c) In both (a) and (b), it should be possible to distribute the libraries as licensed binary with a Python interface.
Have you had a look at pyminifier? It does Minify, obfuscate, and compress Python code. The example code looks pretty nasty for casual reverse engineering.
$ pyminifier --nonlatin --replacement-length=50 /tmp/tumult.py
#!/usr/bin/env python3
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲמּ=ImportError
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱=print
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ巡=False
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ澨=object
try:
import demiurgic
except ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲמּ:
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱("Warning: You're not demiurgic. Actually, I think that's normal.")
try:
import mystificate
except ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲמּ:
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱("Warning: Dark voodoo may be unreliable.")
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲﺬ=ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ巡
class ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐦚(ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ澨):
def __init__(self,*args,**kwargs):
pass
def ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ클(self,dactyl):
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ퐐=demiurgic.palpitation(dactyl)
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𠛲=mystificate.dark_voodoo(ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ퐐)
return ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𠛲
def ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐠯(self,whatever):
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱(whatever)
if __name__=="__main__":
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ燱("Forming...")
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲﺃ=ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐦚("epicaricacy","perseverate")
ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲﺃ.ﺭ异𞸐𐤔ﭞﰣﺁں𝕌𨿩𞸇뻛𐬑𥰫嬭ﱌ𢽁𐡆𧪮Ꝫﴹ뙫𢤴퉊ﳦﲣפּܟﺶ𐐤ﶨࠔ𐰷𢡶𧐎𐭈𞸏𢢘𦘼ﶻ𩏃𦽨𞺎𠛘𐠲䉊ﰸﭳᣲ𐠯("Codswallop")
# Created by pyminifier (https://github.com/liftoff/pyminifier)
Is your employer aware that he can "steal" back any ideas that other people get from your code? I mean, if they can read your work, so can you theirs. Maybe looking at how you can benefit from the situation would yield a better return of your investment than fearing how much you could lose.
[EDIT] Answer to Nick's comment:
Nothing gained and nothing lost. The customer has what he wants (and paid for it since he did the change himself). Since he doesn't release the change, it's as if it didn't happen for everyone else.
Now if the customer sells the software, they have to change the copyright notice (which is illegal, so you can sue and will win -> simple case).
If they don't change the copyright notice, the 2nd level customers will notice that the software comes from you original and wonder what is going on. Chances are that they will contact you and so you will learn about the reselling of your work.
Again we have two cases: The original customer sold only a few copies. That means they didn't make much money anyway, so why bother. Or they sold in volume. That means better chances for you to learn about what they do and do something about it.
But in the end, most companies try to comply to the law (once their reputation is ruined, it's much harder to do business). So they will not steal your work but work with you to improve it. So if you include the source (with a license that protects you from simple reselling), chances are that they will simply push back changes they made since that will make sure the change is in the next version and they don't have to maintain it. That's win-win: You get changes and they can make the change themselves if they really, desperately need it even if you're unwilling to include it in the official release.
Use Cython. It will compile your modules to high-performant C files, which can then be compiled to native binary libraries. This is basically un-reversable, compared to .pyc bytecode!
I've written a detailed article on how to set up Cython for a Python project, check it out:
Protecting Python Sources With Cython
Do not rely on obfuscation. As You have correctly concluded, it offers very limited protection.
UPDATE: Here is a link to paper which reverse engineered obfuscated python code in Dropbox. The approach - opcode remapping is a good barrier, but clearly it can be defeated.
Instead, as many posters have mentioned make it:
Not worth reverse engineering time (Your software is so good, it makes sense to pay)
Make them sign a contract and do a license audit if feasible.
Alternatively, as the kick-ass Python IDE WingIDE does: Give away the code. That's right, give the code away and have people come back for upgrades and support.
Shipping .pyc files has its problems - they are not compatible with any other python version than the python version they were created with, which means you must know which python version is running on the systems the product will run on. That's a very limiting factor.
In some circumstances, it may be possible to move (all, or at least a key part) of the software into a web service that your organization hosts.
That way, the license checks can be performed in the safety of your own server room.
Though there's no perfect solution, the following can be done:
Move some critical piece of startup code into a native library.
Enforce the license check in the native library.
If the call to the native code were to be removed, the program wouldn't start anyway. If it's not removed then the license will be enforced.
Though this is not a cross-platform or a pure-Python solution, it will work.
I was surprised in not seeing pyconcrete in any answer. Maybe because it's newer than the question?
It could be exactly what you need(ed).
Instead of obfuscating the code, it encrypts it and decrypts at load time.
From pypi page:
Protect python script work flow
your_script.py import pyconcrete
pyconcrete will hook import module
when your script do import MODULE,
pyconcrete import hook will try to find MODULE.pye first and then
decrypt MODULE.pye via _pyconcrete.pyd and execute decrypted data (as
.pyc content)
encrypt & decrypt secret key record in _pyconcrete.pyd
(like DLL or SO) the secret key would be hide in binary code, can’t
see it directly in HEX view
The reliable only way to protect code is to run it on a server you control and provide your clients with a client which interfaces with that server.
I think there is one more method to protect your Python code; part of the Obfuscation method. I believe there was a game like Mount and Blade or something that changed and recompiled their own python interpreter (the original interpreter which i believe is open source) and just changed the OP codes in the OP code table to be different then the standard python OP codes.
So the python source is unmodified but the file extensions of the *.pyc files are different and the op codes don't match to the public python.exe interpreter. If you checked the games data files all the data was in Python source format.
All sorts of nasty tricks can be done to mess with immature hackers this way. Stopping a bunch of inexperienced hackers is easy. It's the professional hackers that you will not likely beat. But most companies don't keep pro hackers on staff long I imagine (likely because things get hacked). But immature hackers are all over the place (read as curious IT staff).
You could for example, in a modified interpreter, allow it to check for certain comments or doc strings in your source. You could have special OP codes for such lines of code. For example:
OP 234 is for source line "# Copyright I wrote this"
or compile that line into op codes that are equivalent to "if False:" if "# Copyright" is missing. Basically disabling a whole block of code for what appears to be some obscure reason.
One use case where recompiling a modified interpreter may be feasible is where you didn't write the app, the app is big, but you are paid to protect it, such as when you're a dedicated server admin for a financial app.
I find it a little contradictory to leave the source or opcodes open for eyeballs, but use SSL for network traffic. SSL is not 100% safe either. But it's used to stop MOST eyes from reading it. A wee bit precaution is sensible.
Also, if enough people deem that Python source and opcodes are too visible, it's likely someone will eventually develop at least a simple protection tool for it. So the more people asking "how to protect Python app" only promotes that development.
Depending in who the client is, a simple protection mechanism, combined with a sensible license agreement will be far more effective than any complex licensing/encryption/obfuscation system.
The best solution would be selling the code as a service, say by hosting the service, or offering support - although that isn't always practical.
Shipping the code as .pyc files will prevent your protection being foiled by a few #s, but it's hardly effective anti-piracy protection (as if there is such a technology), and at the end of the day, it shouldn't achieve anything that a decent license agreement with the company will.
Concentrate on making your code as nice to use as possible - having happy customers will make your company far more money than preventing some theoretical piracy..
Another attempt to make your code harder to steal is to use jython and then use java obfuscator.
This should work pretty well as jythonc translate python code to java and then java is compiled to bytecode. So ounce you obfuscate the classes it will be really hard to understand what is going on after decompilation, not to mention recovering the actual code.
The only problem with jython is that you can't use python modules written in c.
You should take a look at how the guys at getdropbox.com do it for their client software, including Linux. It's quite tricky to crack and requires some quite creative disassembly to get past the protection mechanisms.
The best you can do with Python is to obscure things.
Strip out all docstrings
Distribute only the .pyc compiled files.
freeze it
Obscure your constants inside a class/module so that help(config) doesn't show everything
You may be able to add some additional obscurity by encrypting part of it and decrypting it on the fly and passing it to eval(). But no matter what you do someone can break it.
None of this will stop a determined attacker from disassembling the bytecode or digging through your api with help, dir, etc.
What about signing your code with standard encryption schemes by hashing and signing important files and checking it with public key methods?
In this way you can issue license file with a public key for each customer.
Additional you can use an python obfuscator like this one (just googled it).
Idea of having time restricted license and check for it in locally installed program will not work. Even with perfect obfuscation, license check can be removed. However if you check license on remote system and run significant part of the program on your closed remote system, you will be able to protect your IP.
Preventing competitors from using the source code as their own or write their inspired version of the same code, one way to protect is to add signatures to your program logic (some secrets to be able to prove that code was stolen from you) and obfuscate the python source code so, it's hard to read and utilize.
Good obfuscation adds basically the same protection to your code, that compiling it to executable (and stripping binary) does. Figuring out how obfuscated complex code works might be even harder than actually writing your own implementation.
This will not help preventing hacking of your program. Even with obfuscation code license stuff will be cracked and program may be modified to have slightly different behaviour (in the same way that compiling code to binary does not help protection of native programs).
In addition to symbol obfuscation might be good idea to unrefactor the code, which makes everything even more confusing if e.g. call graphs points to many different places even if actually those different places does eventually the same thing.
Logical signature inside obfuscated code (e.g. you may create table of values which are used by program logic, but also used as signature), which can be used to determine that code is originated from you. If someone decides to use your obfuscated code module as part of their own product (even after reobfuscating it to make it seem different) you can show, that code is stolen with your secret signature.
I have looked at software protection in general for my own projects and the general philosophy is that complete protection is impossible. The only thing that you can hope to achieve is to add protection to a level that would cost your customer more to bypass than it would to purchase another license.
With that said I was just checking google for python obsfucation and not turning up a lot of anything. In a .Net solution, obsfucation would be a first approach to your problem on a windows platform, but I am not sure if anyone has solutions on Linux that work with Mono.
The next thing would be to write your code in a compiled language, or if you really want to go all the way, then in assembler. A stripped out executable would be a lot harder to decompile than an interpreted language.
It all comes down to tradeoffs. On one end you have ease of software development in python, in which it is also very hard to hide secrets. On the other end you have software written in assembler which is much harder to write, but is much easier to hide secrets.
Your boss has to choose a point somewhere along that continuum that supports his requirements. And then he has to give you the tools and time so you can build what he wants. However my bet is that he will object to real development costs versus potential monetary losses.
Neiher Cython nor Nuitka were not the answer, because when running the solution that is compiled with Nuitka or Cython into .pyd or .exe files a cache directory is generated and all .pyc files are copied into the cache directory, so an attacker simply can decompile .pyc files and see your code or change it.
It is possible to have the py2exe byte-code in a crypted resource for a C launcher that loads and executes it in memory. Some ideas here and here.
Some have also thought of a self modifying program to make reverse engineering expensive.
You can also find tutorials for preventing debuggers, make the disassembler fail, set false debugger breakpoints and protect your code with checksums. Search for ["crypted code" execute "in memory"] for more links.
But as others already said, if your code is worth it, reverse engineers will succeed in the end.
Use the same way to protect binary file of c/c++, that is, obfuscate each function body in executable or library binary file, insert an instruction "jump" at the begin of each function entry, jump to special function to restore obfuscated code. Byte-code is binary code of Python script, so
First compile python script to code object
Then iterate each code object, obfuscate co_code of each code object as the following
0 JUMP_ABSOLUTE n = 3 + len(bytecode)
3
...
... Here it's obfuscated bytecode
...
n LOAD_GLOBAL ? (__pyarmor__)
n+3 CALL_FUNCTION 0
n+6 POP_TOP
n+7 JUMP_ABSOLUTE 0
Save obfuscated code object as .pyc or .pyo file
Those obfuscated file (.pyc or .pyo) can be used by normal python interpreter, when those code object is called first time
First op is JUMP_ABSOLUTE, it will jump to offset n
At offset n, the instruction is to call a PyCFunction. This function will restore those obfuscated bytecode between offset 3 and n, and put the original byte-code at offset 0. The obfuscated code can be got by the following code
char *obfucated_bytecode;
Py_ssize_t len;
PyFrameObject* frame = PyEval_GetFrame();
PyCodeObject *f_code = frame->f_code;
PyObject *co_code = f_code->co_code;
PyBytes_AsStringAndSize(co_code, &obfucated_bytecode, &len)
After this function returns, the last instruction is to jump to
offset 0. The really byte-code now is executed.
There is a tool Pyarmor to obfuscate python scripts by this way.
There is a comprehensive answer on concealing the python source code, which can be find here.
Possible techniques discussed are:
- use compiled bytecode (python -m compileall)
- executable creators (or installers like PyInstaller)
- software as an service (the best solution to conceal your code in my opinion)
- python source code obfuscators
using cxfreeze ( py2exe for linux ) will do the job.
http://cx-freeze.sourceforge.net/
it is available in ubuntu repositories
If we focus on software licensing, I would recommend to take a look at another Stack Overflow answer I wrote here to get some inspiration of how a license key verification system can be constructed.
There is an open-source library on GitHub that can help you with the license verification bit.
You can install it by pip install licensing and then add the following code:
pubKey = "<RSAKeyValue><Modulus>sGbvxwdlDbqFXOMlVUnAF5ew0t0WpPW7rFpI5jHQOFkht/326dvh7t74RYeMpjy357NljouhpTLA3a6idnn4j6c3jmPWBkjZndGsPL4Bqm+fwE48nKpGPjkj4q/yzT4tHXBTyvaBjA8bVoCTnu+LiC4XEaLZRThGzIn5KQXKCigg6tQRy0GXE13XYFVz/x1mjFbT9/7dS8p85n8BuwlY5JvuBIQkKhuCNFfrUxBWyu87CFnXWjIupCD2VO/GbxaCvzrRjLZjAngLCMtZbYBALksqGPgTUN7ZM24XbPWyLtKPaXF2i4XRR9u6eTj5BfnLbKAU5PIVfjIS+vNYYogteQ==</Modulus><Exponent>AQAB</Exponent></RSAKeyValue>"
res = Key.activate(token="WyIyNTU1IiwiRjdZZTB4RmtuTVcrQlNqcSszbmFMMHB3aWFJTlBsWW1Mbm9raVFyRyJd",\
rsa_pub_key=pubKey,\
product_id=3349, key="ICVLD-VVSZR-ZTICT-YKGXL", machine_code=Helpers.GetMachineCode())
if res[0] == None not Helpers.IsOnRightMachine(res[0]):
print("An error occured: {0}".format(res[1]))
else:
print("Success")
You can read more about the way the RSA public key, etc are configured here.
I documented how to obfuscate the python by converting it to .so file, and converting it to a python wheel file:
https://github.com/UM-NLP/python-obfuscation

Is there an open source tool that automatically generates test cases for legacy code?

I recently stumbled over this (aged) article:
http://imranontech.com/2007/01/04/unit-testing-the-final-frontier-legacy-code/
where the author allegedly wrote a perl script to automatically generate test cases.
His strategy went like this (cited):
Read in the header files I gave it.
Extracted the function prototypes.
Gave me the list of functions it found and let me pick
which ones I wanted to create unit tests for.
It then created a dbx
(Solaris debugger) script which would break-point every time the
selected function was called, save the variables that were passed to
it and then continue until the function returned at which point it
would save the return value.
Run the executable under the dbx
script, and which point I proceeded to use the application as
normal, and just ran through lots of use cases which I thought would
go through the code in question and especially cases where I thought
it would hit edge cases in the functions I want to create unit tests
for.
The perl script then took all of the example runs, stripped out
duplicates, and then autogenerated a C file containing unit tests
for each of the examples (i.e pass in the input data and verify the
return value is the same as in the example run) Compiled/Linked/Ran
the unit tests and threw away ones which failed (i.e. get rid of
inputs which cause the function to behave non-deterministically)
I have a lot of legacy code of all kinds in the languages Python and Fortran. The article is from 2007. Is there anything like this implemented in current Unit testing frameworks?
How would i go about writing such a script?
Very C-like. Also, OS dependent, I think (Solaris debugger)? I'd say you should look at "record/capture and playback" tools, though somehow I think the "generate" part never really took off.
Python's testing tools taxonomy would be a great place to start. I'd say you either record your way through application using Selenium or Dogtail. The link takes you right to that section, Web testing tools, but check others as well: fuzzy testing is a technique similar to Golden Master, which sometimes may help with legacy apps, and is a "record / playback" technique. Feathers calls such tests "characterization" test, for they characterize legacy system's behaviours.
Very good point in article you cite:
Have a look at your own source code repository and see which
functions/classes have had the most bugfix checkins applied, 80% of
bugfixes tend to be made to about 20% of the code. There’s sound logic
behind this – often that 20% of the code is poorly written with dozens
or hundreds of “special case” hacks.
This is where I'd actually start. Have you got these parts identified? Simple Git/SVB log usage scripts and coverage tools section from the taxonomy would come in handy with this.
Unfortunately more than that I can't help you - my Python experience is limited and Fortran - non-existing.

Is Python good for big software projects (not web based)? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
Right now I'm developing mostly in C/C++, but I wrote some small utilities in Python to automatize some tasks and I really love it as language (especially the productivity).
Except for the performances (a problem that could be sometimes solved thanks to the ease of interfacing Python with C modules), do you think it is proper for production use in the development of stand-alone complex applications (think for example to a word processor or a graphic tool)?
What IDE would you suggest? The IDLE provided with Python is not enough even for small projects in my opinion.
We've used IronPython to build our flagship spreadsheet application (40kloc production code - and it's Python, which IMO means loc per feature is low) at Resolver Systems, so I'd definitely say it's ready for production use of complex apps.
There are two ways in which this might not be a useful answer to you :-)
We're using IronPython, not the more usual CPython. This gives us the huge advantage of being able to use .NET class libraries. I may be setting myself up for flaming here, but I would say that I've never really seen a CPython application that looked "professional" - so having access to the WinForms widget set was a huge win for us. IronPython also gives us the advantage of being able to easily drop into C# if we need a performance boost. (Though to be honest we have never needed to do that. All of our performance problems to date have been because we chose dumb algorithms rather than because the language was slow.) Using C# from IP is much easier than writing a C Extension for CPython.
We're an Extreme Programming shop, so we write tests before we write code. I would not write production code in a dynamic language without writing the tests first; the lack of a compile step needs to be covered by something, and as other people have pointed out, refactoring without it can be tough. (Greg Hewgill's answer suggests he's had the same problem. On the other hand, I don't think I would write - or especially refactor - production code in any language these days without writing the tests first - but YMMV.)
Re: the IDE - we've been pretty much fine with each person using their favourite text editor; if you prefer something a bit more heavyweight then WingIDE is pretty well-regarded.
You'll find mostly two answers to that – the religous one (Yes! Of course! It's the best language ever!) and the other religious one (you gotta be kidding me! Python? No... it's not mature enough). I will maybe skip the last religion (Python?! Use Ruby!). The truth, as always, is far from obvious.
Pros: it's easy, readable, batteries included, has lots of good libraries for pretty much everything. It's expressive and dynamic typing makes it more concise in many cases.
Cons: as a dynamic language, has way worse IDE support (proper syntax completion requires static typing, whether explicit in Java or inferred in SML), its object system is far from perfect (interfaces, anyone?) and it is easy to end up with messy code that has methods returning either int or boolean or object or some sort under unknown circumstances.
My take – I love Python for scripting, automation, tiny webapps and other simple well defined tasks. In my opinion it is by far the best dynamic language on the planet. That said, I would never use it any dynamically typed language to develop an application of substantial size.
Say – it would be fine to use it for Stack Overflow, which has three developers and I guess no more than 30k lines of code. For bigger things – first your development would be super fast, and then once team and codebase grow things are slowing down more than they would with Java or C#. You need to offset lack of compilation time checks by writing more unittests, refactorings get harder cause you never know what your refacoring broke until you run all tests or even the whole big app, etc.
Now – decide on how big your team is going to be and how big the app is supposed to be once it is done. If you have 5 or less people and the target size is roughly Stack Overflow, go ahead, write in Python. You will finish in no time and be happy with good codebase. But if you want to write second Google or Yahoo, you will be much better with C# or Java.
Side-note on C/C++ you have mentioned: if you are not writing performance critical software (say massive parallel raytracer that will run for three months rendering a film) or a very mission critical system (say Mars lander that will fly three years straight and has only one chance to land right or you lose $400mln) do not use it. For web apps, most desktop apps, most apps in general it is not a good choice. You will die debugging pointers and memory allocation in complex business logic.
In my opinion python is more than ready for developing complex applications. I see pythons strength more on the server side than writing graphical clients. But have a look at http://www.resolversystems.com/. They develop a whole spreadsheet in python using the .net ironpython port.
If you are familiar with eclipse have a look at pydev which provides auto-completion and debugging support for python with all the other eclipse goodies like svn support. The guy developing it has just been bought by aptana, so this will be solid choice for the future.
#Marcin
Cons: as a dynamic language, has way
worse IDE support (proper syntax
completion requires static typing,
whether explicit in Java or inferred
in SML),
You are right, that static analysis may not provide full syntax completion for dynamic languages, but I thing pydev gets the job done very well. Further more I have a different development style when programming python. I have always an ipython session open and with one F5 I do not only get the perfect completion from ipython, but object introspection and manipulation as well.
But if you want to write second Google
or Yahoo, you will be much better with
C# or Java.
Google just rewrote jaiku to work on top of App Engine, all in python. And as far as I know they use a lot of python inside google too.
I really like python, it's usually my language of choice these days for small (non-gui) stuff that I do on my own.
However, for some larger Python projects I've tackled, I'm finding that it's not quite the same as programming in say, C++. I was working on a language parser, and needed to represent an AST in Python. This is certainly within the scope of what Python can do, but I had a bit of trouble with some refactoring. I was changing the representation of my AST and changing methods and classes around a lot, and I found I missed the strong typing that would be available to me in a C++ solution. Python's duck typing was almost too flexible and I found myself adding a lot of assert code to try to check my types as the program ran. And then I couldn't really be sure that everything was properly typed unless I had 100% code coverage testing (which I didn't at the time).
Actually, that's another thing that I miss sometimes. It's possible to write syntactically correct code in Python that simply won't run. The compiler is incapable of telling you about it until it actually executes the code, so in infrequently-used code paths such as error handlers you can easily have unseen bugs lurking around. Even code that's as simple as printing an error message with a % format string can fail at runtime because of mismatched types.
I haven't used Python for any GUI stuff so I can't comment on that aspect.
Python is considered (among Python programmers :) to be a great language for rapid prototyping. There's not a lot of extraneous syntax getting in the way of your thought processes, so most of the work you do tends to go into the code. (There's far less idioms required to be involved in writing good Python code than in writing good C++.)
Given this, most Python (CPython) programmers ascribe to the "premature optimization is the root of all evil" philosophy. By writing high-level (and significantly slower) Python code, one can optimize the bottlenecks out using C/C++ bindings when your application is nearing completion. At this point it becomes more clear what your processor-intensive algorithms are through proper profiling. This way, you write most of the code in a very readable and maintainable manner while allowing for speedups down the road. You'll see several Python library modules written in C for this very reason.
Most graphics libraries in Python (i.e. wxPython) are just Python wrappers around C++ libraries anyway, so you're pretty much writing to a C++ backend.
To address your IDE question, SPE (Stani's Python Editor) is a good IDE that I've used and Eclipse with PyDev gets the job done as well. Both are OSS, so they're free to try!
[Edit] #Marcin: Have you had experience writing > 30k LOC in Python? It's also funny that you should mention Google's scalability concerns, since they're Python's biggest supporters! Also a small organization called NASA also uses Python frequently ;) see "One coder and 17,000 Lines of Code Later".
Nothing to add to the other answers, besides that if you choose python you must use something like pylint which nobody mentioned so far.
One way to judge what python is used for is to look at what products use python at the moment. This wikipedia page has a long list including various web frameworks, content management systems, version control systems, desktop apps and IDEs.
As it says here - "Some of the largest projects that use Python are the Zope application server, YouTube, and the original BitTorrent client. Large organizations that make use of Python include Google, Yahoo!, CERN and NASA. ITA uses Python for some of its components."
So in short, yes, it is "proper for production use in the development of stand-alone complex applications". So are many other languages, with various pros and cons. Which is the best language for your particular use case is too subjective to answer, so I won't try, but often the answer will be "the one your developers know best".
Refactoring is inevitable on larger codebases and the lack of static typing makes this much harder in python than in statically typed languages.
And as far as I know they use a lot of python inside google too.
Well i'd hope so, the maker of python still works at google if i'm not mistaken?
As for the use of Python, i think it's a great language for stand-alone apps. It's heavily used in a lot of Linux programs, and there are a few nice widget sets out there to aid in the development of GUI's.
Python is a delight to use. I use it routinely and also write a lot of code for work in C#. There are two drawbacks to writing UI code in Python. one is that there is not a single ui framework that is accepted by the majority of the community. when you write in c# the .NET runtime and class libraries are all meant to work together. With Python every UI library has at's own semantics which are often at odds with the pythonic mindset in which you are trying to write your program. I am not blaming the library writers. I've tried several libraries (wxwidgets, PythonWin[Wrapper around MFC], Tkinter), When doing so I often felt that I was writing code in a language other than Python (despite the fact that it was python) because the libraries aren't exactly pythonic they are a port from another language be it c, c++, tk.
So for me I will write UI code in .NET (for me C#) because of the IDE & the consistency of the libraries. But when I can I will write business logic in python because it is more clear and more fun.
I know I'm probably stating the obvious, but don't forget that the quality of the development team and their familiarity with the technology will have a major impact on your ability to deliver.
If you have a strong team, then it's probably not an issue if they're familiar. But if you have people who are more 9 to 5'rs who aren't familiar with the technology, they will need more support and you'd need to make a call if the productivity gains are worth whatever the cost of that support is.
I had only one python experience, my trash-cli project.
I know that probably some or all problems depends of my inexperience with python.
I found frustrating these things:
the difficult of finding a good IDE for free
the limited support to automatic refactoring
Moreover:
the need of introduce two level of grouping packages and modules confuses me.
it seems to me that there is not a widely adopted code naming convention
it seems to me that there are some standard library APIs docs that are incomplete
the fact that some standard libraries are not fully object oriented annoys me
Although some python coders tell me that they does not have these problems, or they say these are not problems.
Try Django or Pylons, write a simple app with both of them and then decide which one suits you best. There are others (like Turbogears or Werkzeug) but those are the most used.

Categories