This question already has answers here:
How can I sandbox Python in pure Python?
(7 answers)
Closed 9 years ago.
I'd like to make a website where people could upload their Python scripts. Of course I'd like to execute those scripts. Those scripts should do some interesting work. The problem is that people could upload scripts that could harm my server and I'd like to prevent that. What is the option to run arbitrary scripts without harming my system - actually without seeing my system at all? Thank you
"Can't be done."
Running arbitrary (untrusted) scripts and staying safe is a contradiction. You should go as far as using custom kernels, jails, vms, the like.
You can look at how http://codepad.org/about does it, it's a lot of work.
I dont know in earlier versions, in Python 3 you can create functions with access to a custom scope through types.FunctionType.
def f():
return __builtins__
f() # this will work because it has access to __builtins__
scope = {}
sandboxed = FunctionType(f.__code__,scope)
sandboxed() # will throw NameError, builtins is not defined
the returned function has only access to whatever you supplied in the scope dictionary. I wonder if still there are hacks around this.
there are quite a lot of web-server running untrusted python codes nowadays:
http://codepad.org/ (probably the most notorious pastebin for python codes)
http://codingbat.com/ (previously Javabat, name change to reflect Python addition)
http://appengine.google.com/ (host python code on Google's infrastructure)
http://www.spoj.pl/ the infamous Sphere Online Judge coding challenge
you may want to look at how they approached their problems.
or you may want to look at a different approach:
http://pyjs.org/ - pyjamas - python-to-javascript compiler (running client-side, switch the security problem to their side)
"Can't be done," is too harsh. JavaScript engines live in your web browser and they accept and run untrusted scripts safely. There's always the possibility of exploits, but in correct engine operation they are innocuous. There are even "slow script" checks that prevent infinite loops from denial-of-service attacking your browser, making those little alert dialogs.
Google App Engine runs a sandboxed version of the Python VM which effectively removes all the naughty native bits that let you get at the underlying system. To do this yourself in a safe manner would take some Python VM expertise.
For sanity, you could start off by removing all builtins and whitelisting the ones you want to allow users once you certify they don't touch the underlying system.
It feels like something somebody must have already done, but I don't know of any existing project that does it. :-/
I think the way to do this is to run those scripts in normal Python shell, but on a virtual machine. I might be biased, because my "job" is currently to play around with VMs (universities are great!).
A new VM instance can be created and started in seconds. If you keep a few around and replace only those that get broken, you have good service, absolute security and almost no effort.
But there is one thing: Virtually all web hosts today are virtual machines and they don't support another virtual machine inside. You need a real, physical server to do this.
Brett Cannon has a tentative design for doing this, last I knew, but it has not been developed. So unless you're looking to put a LOT of effort into making this happen, there currently isn't a solution publicly available.
Brett's blog is at: http://sayspy.blogspot.com/ if you want to try to read up on it, I couldn't find a direct link to his discussions about the new security design. I can't recall if I read his blog talking about it, or if it was in person where he mentioned it, sorry.
There used to be some restricted execution abilities, but they were dropped because they just didn't work.
It's not impossible to do, but it's not something that Python is able to do right now. It's something people would like, but it's not really a high priority from what I've seen.
trypython.org (BSD licensed source here) does a safe browser oriented version of such a sanbox in IronPython (via Silverlight/Moonlight). You may be able to mash together a headless version of this for use on a server -- but you could definitely let users distribute scripts between each other, or you could distribute these scripts to be executed within the plugin environment.
If you use Linux maybe seccomp is the solution, even the mode 2 is nicer. With those you can create a new process that will fail any syscall and can only read already existing file descriptors.
Maybe using also namespaces and cgroup would help, this can be done with ctypes.
You could try Ideone API - it allows Python 2 and Python 3
Related
Software these days can be separated into two categories: runs on client infrastructure (like in the case of enterprise software, like Splunk or Tibco), OR runs on the infrastructure of a software provider (like in the case of Facebook, where you need to use their API to access the backend).
In the first category, the client pays for a license and receives the software to run on their own machines on their premises of choice. The client IS in possession of the actual code and software.
In the second category, the software resides somewhere external and can be accessed only by an API. The client is NOT in possession of the software and can only use it to the extent allowed by the API.
My question is: in the first category above, how is the actual code kept hidden from the client?
Let's say I've built a really awesome analysis engine in Python for analyzing output logs. A corporate client is interested in using it for their internal applications. However, they insist that my engine must run on their own machines for security reasons. If I succumb and give them my Python code, then I will risk my intellectual property.
In that case, do I need to rewrite all my code into a compiled language like C++ to obfuscate it during compile time? Or is there a way to keep it in Python but secure the source code it in another way?
Update:
Given the answers below, in that case, would the more efficient pathway to developing a client-hosted application (i.e.: first category above) be to write a proof of concept in a more convenient language like Python first, and then take those ideas and rewrite it into C++?
The short answer is that you pretty much can't. You can do workarounds, but in the end almost anyone can reverse engineer your code no matter how you obfuscate it.
Your best bet might be to use something like PyInstaller and see if you can only include .pyc files. That doesn't protect you all the way, but it at least makes it a pain to reverse. You might even be able to find an obfuscater to run it first, but I don't know much about that part.
Similar to #gabeappleton's suggestion above, compile the Python code into an EXE. I use cx_freeze quite regularly and have good success. It's pretty well documented and reasonable support on these forums.
I have a Python project that dynamically loads Python scripts from a set of specified directories and executes an expected function off of them. To harden the security of this application, I would like to analyze the scripts to ensure that they are just pure math functions and, therefore, not interacting with any system components such as the HDD/SDD, the network, a database, etc. Is this even possible to do in Python?
This question has been moved to https://security.stackexchange.com/questions/131283/how-does-one-verify-that-a-python-script-is-a-pure-math-function, but I'm leaving this here, for now, to keep the comments and answers that have already been provided.
It appears that sandboxing to disable things like I/O, network etc.. isn't fully reliable.
Since Python doesn't have any permission system embedded, it'll be pretty hard to do what you want.
I'll try to keep this question objective. What is the canonical way to build a plugin-system for a Python desktop application?
Is it possible to have an easy to use (for the developer) system which achieves the following:
Users input their code using an in-app editor (the editor could end up dumping their plugins into a subdirectory of the app, if necessary)
Keep the source of the application "closed" (Yes, there are pitfalls to obfuscation)
Thanks
I am guessing you need some sort of educational system where the user can submit code, presumable to check that the code performs cf. a exercise.
My immediate thoughts about this would to use a web-interface. In this manner, the code of the evaluating system is entirely hidden (unless the student hacks your webserver, but this is an entirely different topic).
However, for this to work you must be aware of the pitfalls of allowing others submit code to your service, that is then executed. The code must be rigorously checked for the obvious things, but the issues here are open-bounded. You must also protect your service from the back-end by providing a safe environment for executing (e.g. a sandbox).
I'm creating a corewars type application that runs on django and allows a user to upload some python code that will control their character. Now, I know the real answer to this is that as long as I'm taking code input from untrusted users I'll have security vulnerabilities. I'm just trying to minimize the risk as much as possible. Here are some that spring to mind:
__import__ (I'll probably also do some ast scanning to make sure there aren't any import statements)
open
file
input
raw_input
Are there any others I'm missing?
There are lots of answers on what to do in general about restricting Python at http://wiki.python.org/moin/SandboxedPython. When I looked at it some time ago, the Zope RestrictedPython looked the best solution, working with a whitelist system. You'll still need to take care in your own code so that you don't expose any security vulnerabilities, but that seems to be the best system out there.
Since you sound determined to do this, I'll link you to the standard rexec module, not because I think you should use it (don't - it has known vulnerabilities), but because it might be a good starting point for getting your webserver compromised your own restricted-execution framework.
In particular, under the heading "Defining restricted environments" several modules and functions are listed that were considered reasonably safe by the rexec designer; these might be usable as an initial whitelist of sorts. I'd also suggest examining its code for other gotchas you might not have thought of.
You will really need to avoid eval.
Imagine code such as:
eval("__impor" + "t__('whatever').destroy_your_server")
This is probably the most important one.
Yeah, you have to whitelist. There are so many ways to hide the bad commands.
This is NOT the worst case scenario:
the worst case scenario is that someone gets into the database
The worst case scenario is getting the entire machine rooted and you not noticing as it probes your other machines and keylogs your passwords. Isolate this machine and consider it hostile (DMZ, block it from being able to launch attacks internally and externally, etc). Run tripwire or AIDE on non-writeable media and log everything to a second host.
Finally, as plash shows, there are a lot of dangerous system calls that need to be protected against.
If you're not committed to using Python as the language inside the game, one possibility would be to embed Lua using LunaticPython (I suggest the bugfixes branch at https://code.launchpad.net/~dne/lunatic-python/bugfixes).
It's much easier to sandbox Lua than Python, and it's much easier to embed Lua than to create your own programming language.
You should use a whitelist, rather than a blacklist. If you use a blacklist, you will always miss something. Even if you don't, Python will add a function to the standard library, and you won't update your blacklist in time.
Things you're currently allowing but probably should not include:
compile
eval
reload (if they do access the filesystem somehow, this is basically import)
I agree that this would be very tricky to do correctly. One complication (among many) could be a user accessing one of these functions through a field in another class.
I would consider using another isolation mechanism, such as a virtual machine, instead or in addition to this. You might look at how codepad does it.
I have an application written in python. I created a plugin system for the application that uses egg files. Egg files contain compiled python files and can be easily decompiled and used to hack the application. Is there a way to secure this system? I'd like to use digital signature for this - sign these egg files and check the signature before loading such egg file. Is there a way to do this programmatically from python? Maybe using winapi?
Is there a way to secure this system?
The answer is "that depends".
The two questions you should ask is "what are people supposed to be able to do" and "what are people able to do (for a given implementation)". If there exists an implementation where the latter is a subset of the former, the system can be secured.
One of my friend is working on a programming competition judge: a program which runs a user-submitted program on some test data and compares its output to a reference output. That's damn hard to secure: you want to run other peoples' code, but you don't want to let them run arbitrary code. Is your scenario somewhat similar to this? Then the answer is "it's difficult".
Do you want users to download untrustworthy code from the web and run it with some assurance that it won't hose their machine? Then look at various web languages. One solution is not offering access to system calls (JavaScript) or offering limited access to certain potentially dangerous calls (Java's SecurityManager). None of them can be done in python as far as I'm aware, but you can always hack the interpreter and disallow the loading of external modules not on some whitelist. This is probably error-prone.
Do you want users to write plugins, and not be able to tinker with what the main body of code in your application does? Consider that users can decompile .pyc files and modify them. Assume that those running your code can always modify it, and consider the gold-farming bots for WoW.
One Linux-only solution, similar to the sandboxed web-ish model, is to use AppArmor, which limits which files your app can access and which system calls it can make. This might be a feasible solution, but I don't know much about it so I can't give you advice other than "investigate".
If all you worry about is evil people modifying code while it's in transit in the intertubes, standard cryptographic solutions exist (SSL). If you want to only load signed plugins (because you want to control what the users do?), signing code sounds like the right solution (but beware of crafty users or evil people who edit the .pyc files and disables the is-it-signed check).
Maybe some crypto library like this http://chandlerproject.org/Projects/MeTooCrypto helps to build an ad-hoc solution. Example usage: http://tdilshod.livejournal.com/38040.html