I'm creating a corewars type application that runs on django and allows a user to upload some python code that will control their character. Now, I know the real answer to this is that as long as I'm taking code input from untrusted users I'll have security vulnerabilities. I'm just trying to minimize the risk as much as possible. Here are some that spring to mind:
__import__ (I'll probably also do some ast scanning to make sure there aren't any import statements)
open
file
input
raw_input
Are there any others I'm missing?
There are lots of answers on what to do in general about restricting Python at http://wiki.python.org/moin/SandboxedPython. When I looked at it some time ago, the Zope RestrictedPython looked the best solution, working with a whitelist system. You'll still need to take care in your own code so that you don't expose any security vulnerabilities, but that seems to be the best system out there.
Since you sound determined to do this, I'll link you to the standard rexec module, not because I think you should use it (don't - it has known vulnerabilities), but because it might be a good starting point for getting your webserver compromised your own restricted-execution framework.
In particular, under the heading "Defining restricted environments" several modules and functions are listed that were considered reasonably safe by the rexec designer; these might be usable as an initial whitelist of sorts. I'd also suggest examining its code for other gotchas you might not have thought of.
You will really need to avoid eval.
Imagine code such as:
eval("__impor" + "t__('whatever').destroy_your_server")
This is probably the most important one.
Yeah, you have to whitelist. There are so many ways to hide the bad commands.
This is NOT the worst case scenario:
the worst case scenario is that someone gets into the database
The worst case scenario is getting the entire machine rooted and you not noticing as it probes your other machines and keylogs your passwords. Isolate this machine and consider it hostile (DMZ, block it from being able to launch attacks internally and externally, etc). Run tripwire or AIDE on non-writeable media and log everything to a second host.
Finally, as plash shows, there are a lot of dangerous system calls that need to be protected against.
If you're not committed to using Python as the language inside the game, one possibility would be to embed Lua using LunaticPython (I suggest the bugfixes branch at https://code.launchpad.net/~dne/lunatic-python/bugfixes).
It's much easier to sandbox Lua than Python, and it's much easier to embed Lua than to create your own programming language.
You should use a whitelist, rather than a blacklist. If you use a blacklist, you will always miss something. Even if you don't, Python will add a function to the standard library, and you won't update your blacklist in time.
Things you're currently allowing but probably should not include:
compile
eval
reload (if they do access the filesystem somehow, this is basically import)
I agree that this would be very tricky to do correctly. One complication (among many) could be a user accessing one of these functions through a field in another class.
I would consider using another isolation mechanism, such as a virtual machine, instead or in addition to this. You might look at how codepad does it.
Related
I'm creating a program in python (2.7) and I want to protect it from reverse engineering.
I compiled it using cx_freeze (supplies basic security- obfuscation and anti-debugging)
How can I add more protections such as obfuscation, packing, anti-debugging, encrypt the code recognize VM.
I thought maybe to encrypt to payload and decrypt it on run time, but I have no clue how to do it.
Generally speaking, it's almost impossible for you to make your program unbreakable as long as there's enough motive for the hackers.
But still you can make it harder to be reverse engineered, try to use cython to compile your core codes into pyd or so files.
There's no way to make anything digital safe nowadays.
What you CAN do is making it hard to a point where it's frustrating to do it, but I admit I don't know python specific ways to achieve that. The amount of security of your program is not actually a function of programsecurity, but of psychology.
Yes, psychology.
Given the fact that it's an arms race between crackers and anti-crackers, where both continuously attempt to top each other, the only thing one can do is trying to make it as frustrating as possible. How do we achieve that?
By being a pain in the rear!
Every additional step you take to make sure your code is hard to decipher is a good one.
For example could you turn your program into a single compiled block of bytecode, which you call from inside your program. Use an external library to encrypt it beforehand and decrypt it afterwards. Do the same with extra steps for codeblocks of functions. Or, have functions in precompiled blocks ready, but broken. At runtime, utilizing byteplay, repair the bytecode with bytes depending on other bytes of different functions, which would then stop your program from working when modified.
There are lots of ways of messing with people's heads and while I can't tell you any python specific ways, if you think in context of "How to be difficult", you'll find the weirdest ways of making it a mess to deal with your code.
Funnily enough this is much easier in assembly, than python, so maybe you should look into executing foreign code via ctypes or whatever.
Summon your inner Troll!
Story time: I was a Python programmer for a long time. Recently I joined in a company as a Python programmer. My manager was a Java programmer for a decade I guess. He gave me a project and at the initial review, he asked me that are we obfuscating the code? and I said, we don't do that kind of thing in Python. He said we do that kind of things in Java and we want the same thing to be implemented in python. Eventually I managed to obfuscate code just removing comments and spaces and renaming local variables) but entire python debugging process got messed up.
Then he asked me, Can we use ProGuard? I didn't know what the hell it was. After some googling I said it is for Java and cannot be used in Python. I also said whatever we are building we deploy in our own servers, so we don't need to actually protect the code. But he was reluctant and said, we have a set of procedures and they must be followed before deploying.
Eventually I quit my job after a year tired of fighting to convince them Python is not Java. I also had no interest in making them to think differently at that point of time.
TLDR; Because of the open source nature of the Python, there are no viable tools available to obfuscate or encrypt your code. I also don't think it is not a problem as long as you deploy the code in your own server (providing software as a service). But if you actually provide the product to the customer, there are some tools available to wrap up your code or byte code and give it like a executable file. But it is always possible to view your code if they want to. Or you choose some other language that provides better protection if it is absolutely necessary to protect your code. Again keep in mind that it is always possible to do reverse engineering on the code.
I'll try to keep this question objective. What is the canonical way to build a plugin-system for a Python desktop application?
Is it possible to have an easy to use (for the developer) system which achieves the following:
Users input their code using an in-app editor (the editor could end up dumping their plugins into a subdirectory of the app, if necessary)
Keep the source of the application "closed" (Yes, there are pitfalls to obfuscation)
Thanks
I am guessing you need some sort of educational system where the user can submit code, presumable to check that the code performs cf. a exercise.
My immediate thoughts about this would to use a web-interface. In this manner, the code of the evaluating system is entirely hidden (unless the student hacks your webserver, but this is an entirely different topic).
However, for this to work you must be aware of the pitfalls of allowing others submit code to your service, that is then executed. The code must be rigorously checked for the obvious things, but the issues here are open-bounded. You must also protect your service from the back-end by providing a safe environment for executing (e.g. a sandbox).
let us assume that there is a big, commercial project (a.k.a Project), which uses Python under the hood to manage plugins for configuring new control surfaces which can be attached and used by Project.
There was a small information leak, some part of the Project's Python API leaked to the public information and people were able to write Python scripts which were called by the underlying Python implementation as a part of Project's plugin loading mechanism.
Further on, using inspect module and raw __dict__ readings, people were able to find out a major part of Project's underlying Python implementation.
Is there a way to keep the Python secret codes secret?
Quick look at Python's documentation revealed a way to suppres a import of inspect module this way:
import sys
sys.modules['inspect'] = None
Does it solve the problem completely?
No, this does not solve the problem. Someone could just rename the inspect module to something else and import it.
What you're trying to do is not possible. The python interpreter must be able to take your bytecode and execute it. Someone will always be able to decompile the bytecode. They will always be able to produce an AST and view the flow of the code with variable and class names.
Note that this process can also be done with compiled language code; the difference there is that you will get assembly. Some tools can infer C structure from the assembly, but I don't have enough experience with that to comment on the details.
What specific piece of information are you trying to hide? Could you keep the algorithm server side and make your software into a client that touches your web service? Keeping the code on a machine you control is the only way to really keep control over the code. You can't hand someone a locked box, the keys to the box, and prevent them from opening the box when they have to open it in order to run it. This is the same reason DRM does not work.
All that being said, it's still possible to make it hard to reverse engineer, but it will never be impossible when the client has the executable.
There is no way to keep your application code an absolute secret.
Frankly, if a group of dedicated and determined hackers (in the good sense, not in the pejorative sense) can crack the PlayStation's code signing security model, then your app doesn't stand a chance. Once you put your app into the hands of someone outside your company, it can be reverse-engineered.
Now, if you want to put some effort into making it harder, you can compile your own embedded python executable, strip out unnecessary modules, obfuscate the compiled python bytecode and wrap it up in some malware rootkit that refuses to start your app if a debugger is running.
But you should really think about your business model. If you see the people who are passionate about your product as a threat, if you see those who are willing to put time and effort into customizing your product to personalize their experience as a danger, perhaps you need to re-think your approach to security. Assuming you're not in the DRM business, or have a similar model that involves squeezing money from reluctant consumers, consider developing an approach that involves sharing information with your users, and allowing them to collaboratively improve your product.
Is there a way to keep the Python secret codes secret?
No there is not.
Python is particularly easy to reverse engineer, but other languages, even compiled ones, are easy enough to reverse.
You cannot fully prevent reverse engineering of software - if it comes down to it, one can always analyze the assembler instructions your program consists of.
You can, however, significantly complicate the process, for example by messing with Python internals. However, before jumping to how to do it, I'd suggest you evaluate whether to do it. It's usually harder to "steal" your code (one needs to fully understand them to be able to extend them, after all) than code it oneself. A pure, unobfuscated Python plugin interface, however, can be vital in creating a whole ecosystem around your program, far outweighing the possible downsides to having someone peek in your maybe not perfectly designed coding internals.
I have an application written in python. I created a plugin system for the application that uses egg files. Egg files contain compiled python files and can be easily decompiled and used to hack the application. Is there a way to secure this system? I'd like to use digital signature for this - sign these egg files and check the signature before loading such egg file. Is there a way to do this programmatically from python? Maybe using winapi?
Is there a way to secure this system?
The answer is "that depends".
The two questions you should ask is "what are people supposed to be able to do" and "what are people able to do (for a given implementation)". If there exists an implementation where the latter is a subset of the former, the system can be secured.
One of my friend is working on a programming competition judge: a program which runs a user-submitted program on some test data and compares its output to a reference output. That's damn hard to secure: you want to run other peoples' code, but you don't want to let them run arbitrary code. Is your scenario somewhat similar to this? Then the answer is "it's difficult".
Do you want users to download untrustworthy code from the web and run it with some assurance that it won't hose their machine? Then look at various web languages. One solution is not offering access to system calls (JavaScript) or offering limited access to certain potentially dangerous calls (Java's SecurityManager). None of them can be done in python as far as I'm aware, but you can always hack the interpreter and disallow the loading of external modules not on some whitelist. This is probably error-prone.
Do you want users to write plugins, and not be able to tinker with what the main body of code in your application does? Consider that users can decompile .pyc files and modify them. Assume that those running your code can always modify it, and consider the gold-farming bots for WoW.
One Linux-only solution, similar to the sandboxed web-ish model, is to use AppArmor, which limits which files your app can access and which system calls it can make. This might be a feasible solution, but I don't know much about it so I can't give you advice other than "investigate".
If all you worry about is evil people modifying code while it's in transit in the intertubes, standard cryptographic solutions exist (SSL). If you want to only load signed plugins (because you want to control what the users do?), signing code sounds like the right solution (but beware of crafty users or evil people who edit the .pyc files and disables the is-it-signed check).
Maybe some crypto library like this http://chandlerproject.org/Projects/MeTooCrypto helps to build an ad-hoc solution. Example usage: http://tdilshod.livejournal.com/38040.html
For a Django application that I'm working on, I wanted to allow group membership to be determined by Active Directory group. After a while of digging through the pywin32 documentation, I came up with this:
>>> import win32net
>>> win32net.NetUserGetGroups('domain_name.com', 'username')
[(u'Domain Users', 7), ...]
I spent a while googling before I figured this out though, and the examples I found almost exclusively used LDAP for this kind of thing. Is there any reason why that's to be preferred over this method? Bear a couple things in mind:
I'm not using Active Directory to actually perform authentication, only permissions. Authentication is performed by another server.
While it would be nice to have some cross-platform capabilities, this will probably run almost exclusively on Windows.
AD's LDAP interface has quite a few 'quirks' that make it more difficult to use than it might appear on the surface, and it tends to lag significantly behind on features. When I worked with it, I mostly dealt with authentication, but it's probably the same no matter what you're doing. There's a lot of weirdness in terms of having to be bound as a certain user just to do simple searches that a normal LDAP server would let you do as anonymous.
Also, at least as of a year ago, when I worked on this, python-ldap was the only Python LDAP implementation to support anywhere close to the full feature set, since it's built on top of OpenLDAP, However, OpenLDAP is rather difficult to build on Windows (and in general), so most builds will be missing one or more features. Although you're not doing authentication, a lack of SASL/Kerberos support (which was missing at the time I used it) might make things complicated for you.
If you have something that works, and only need to run it on Windows, I would really recommend sticking to it; using AD via LDAP can turn into a big project.
import wmi
oWMI = wmi.WMI(namespace="directory\ldap")
ADUsers = oWMI.query("select ds_name from ds_user")
for user in ADUsers:
print user.ds_name
Check out Tim Golden's Python Stuff.
import active_directory
user = active_directory.find_user(user_name)
groups = user.memberOf