sandbox to execute possibly unfriendly python code [duplicate] - python

This question already has answers here:
How can I sandbox Python in pure Python?
(7 answers)
Python, safe, sandbox [duplicate]
(9 answers)
Closed 9 years ago.
Let's say there is a server on the internet that one can send a piece of code to for evaluation. At some point server takes all code that has been submitted, and starts running and evaluating it. However, at some point it will definitely bump into "os.system('rm -rf *')" sent by some evil programmer. Apart from "rm -rf" you could expect people try using the server to send spam or dos someone, or fool around with "while True: pass" kind of things.
Is there a way to coop with such unfriendly/untrusted code? In particular I'm interested in a solution for python. However if you have info for any other language, please share.

If you are not specific to CPython implementation, you should consider looking at PyPy[wiki] for these purposes — this Python dialect allows transparent code sandboxing.
Otherwise, you can provide fake __builtin__ and __builtins__ in the corresponding globals/locals arguments to exec or eval.
Moreover, you can provide dictionary-like object instead of real dictionary and trace what untrusted code does with it's namespace.
Moreover, you can actually trace that code (issuing sys.settrace() inside restricted environment before any other code executed) so you can break execution if something will go bad.
If none of solutions is acceptable, use OS-level sandboxing like chroot, unionfs and standard multiprocess python module to spawn code worker in separate secured process.

You can check pysandbox which does just that, though the VM route is probably safer if you can afford it.

It's impossible to provide an absolute solution for this because the definition of 'bad' is pretty hard to nail down.
Is opening and writing to a file bad or good? What if that file is /dev/ram?
You can profile signatures of behavior, or you can try to block anything that might be bad, but you'll never win. Javascript is a pretty good example of this, people run arbitrary javascript code all the time on their computers -- it's supposed to be sandboxed but there's all sorts of security problems and edge conditions that crop up.
I'm not saying don't try, you'll learn a lot from the process.
Many companies have spent millions (Intel just spent billions on McAffee) trying to understand how to detect 'bad code' -- and every day machines running McAffe anti-virus get infected with viruses. Python code isn't any less dangerous than C. You can run system calls, bind to C libraries, etc.

I would seriously consider virtualizing the environment to run this stuff, so that exploits in whatever mechanism you implement can be firewalled one more time by the configuration of the virtual machine.
Number of users and what kind of code you expect to test/run would have considerable influence on choices btw. If they aren't expected to link to files or databases, or run computationally intensive tasks, and you have very low pressure, you could be almost fine by just preventing file access entirely and imposing a time limit on the process before it gets killed and the submission flagged as too expensive or malicious.
If the code you're supposed to test might be any arbitrary Django extension or page, then you're in for a lot of work probably.

You can try some generic sanbox such as Sydbox or Gentoo's sandbox. They are not Python-specific.
Both can be configured to restrict read/write to some directories. Sydbox can even sandbox sockets.

I think a fix like this is going to be really hard and it reminds me of a lecture I attended about the benefits of programming in a virtual environment.
If you're doing it virtually its cool if they bugger it. It wont solve a while True: pass but rm -rf / won't matter.

Unless I'm mistaken (and I very well might be), this is much of the reason behind the way Google changed Python for the App Engine. You run Python code on their server, but they've removed the ability to write to files. All data is saved in the "nosql" database.
It's not a direct answer to your question, but an example of how this problem has been dealt with in some circumstances.

Related

How to sandbox students' Python 3 code submissions for automatic assignment evaluation on Windows 10?

I am the TA for a coding class in which students will have to write Python 3 scripts to solve programming problems. An assignment consists of several problems, and for each problem the student is supposed to write a python program that will read input from standard input and write the output to the standard output. And for each problem there will be hidden test cases that we will use to evaluate their codes and grade them accordingly. So the idea is to automatize this process as much as possible. The problem is how to implement the whole framework to run students' assignments without compromising the safety of the system the assignments will be running on, which will probably be my laptop (which has Windows 10). I need to set up some kind of sandbox for Python 3, establishing limits for execution time, memory usage, disallowing access to the file system, networking, limiting imports only to safe modules from Python's standard library, etc.
Conceptually speaking I would like some kind of sand-boxed service that can receive a python script + some tests cases, the service runs the python script against the test cases in a safe environment (detecting compilation errors, time limit exceeded errors, memory limit exceeded errors, attempts to use forbidden libraries, etc.) and reporting the results back. So from Windows I can simply write a simple script that iterates over all students submissions and uses this service as a black-box to evaluate them.
Is anything like that possible on Windows 10? If so, how? My educated guess is that something like Docker or a Virtual Machine might be useful, but to be honest I'm not really sure because I lack enough expertise in these technologies, so I'm open to any suggestions.
Any advises on how to set up a secure system for automatic evaluation of untrusted Python 3 code submissions will be very appreciated.
What you are looking for a system that automatically evaluates a code using test cases.
You can use CMS to satisfy your use case. It is mainly a system to manage a programming contest, but it will be perfect for what you are trying to accomplish in your class.

Saving the stack?

I'm just curious, is it possible to dump all the variables and current state of the program to a file, and then restore it on a different computer?!
Let's say that I have a little program in Python or Ruby, given a certain condition, it would dump all the current variables, and current state to a file.
Later, I could load it again, in a different machine, and return to it.
Something like VM snapshot function.
I've seen here a question like this, but Java related, saving the current JVM and running it again in a different JVM. Most of the people told that there was nothing like that, only Terracotta had something, still, not perfect.
Thank you.
To clarify what I am trying to achieve:
Given 2 or more Raspberry Pi's, I'm trying to run my software at Pi nº1, but then, when I need to do something different with it, I need to move the software to Pi nº2 without dataloss, only a minor break time.
And so on, to an unlimited number of machines.
Its seams that I was trying to re-invent the wheel.
Check this links:
http://en.wikipedia.org/wiki/Application_checkpointing#DMTCP
http://www.linuxscrew.com/2007/10/17/cryopid-freeze-and-unfreeze-processes-in-linux/
Good question.
In Smalltalk, yes.
Actually, in Smalltalk, dumping the whole program and restarting is the only way to store and share programs. There are no source files and there is no way of starting a program from square zero. So in Smalltalk you would get your feature for free.
The Smalltalk VM offers a hook where each object can register to restore its externals resources after a restart, like reopening files and internet connections. But also, for example integer arrays are registered to that hook to change the endianness of their values on case the dump has been moved to a machine with different endianness.
This might give a hunch at how difficult (or not) it might turn our to achieve this in a language which does not support resumable dumps by design.
All other languages are, alas, much less live. Except for some Lisp implementation, I would not know of any language which supports resuming from a memory dump.
Which is a missed opportunity.
I've seen Mariano demonstrate that using Fuel (object serialization) in Pharo Smalltalk on a recent Esug conference. You could continue debugging and running as long as you don't hit objects not serialized. Squeak smalltalk runs on the Pi, and if saving an image is good enough for you, this is trivial. We're still waiting for the faster JITting VM for ARM though (part of the Google Summer of Code programme)

Code interpreter in a web service

I'd like to build a website with a sandboxed interpreter (or compiler) either on the client side of on the server side that can take short blocks of code (python/java/c/c++ any common language would do) as input and execute it.
What I want to build is a place where given a programming question, the user can type in the solution and we can run it through some test cases, to either approve the solution or provide a test case where it breaks.
Looking for pointers to libraries, existing implementation or a general idea.
Any help much appreciated.
There are many contest websites that do something like this-- TopCoder and Timus Online Judge are two examples. They don't have much information on the technology, however.
codepad.org is the closest to what you want to do. They run programs on heavily sandboxed and firewalled EC2 servers that are periodically wiped, to prevent exploits.
Codepad is at least partially based on geordi, an IRC bot designed to run arbitrary C++ programs. It uses Haskell and traps system calls to prevent harmful activity.
Of slightly less interest, one of Google App Engine's example projects is a Python shell. It relies on GAE's server-side sandboxing to prevent malicious activity.
In terms of interface, the simplest would be to do something like the Internation Informatics Olympiad. Have people write a function with a certain name in the target language, then invoke that from your testing framework. Have simple functions that will let them request information from the framework, if necessary.
For Python you can compile PyPy in sandboxed mode which gives you a complete interpreter and full standard library but without the ability to execute arbitrary system calls. You can also limit the runtime and heap size of executed scripts.
Here's some code I wrote a while back to execute an arbitrary string containing a Python script in the pypy-sandbox binary and return the output. You can call this code from regular CPython.
Take a look at the paper An Enticing Environment for Programming which discusses building just such an environment.

DIsable Python module

Is there any way to disable a module from being loaded on my system? Let's say i would like to restrict my users from accessing the subprocess or popen2 module. Something like PHP's 'disabled_functions' or any similar method to achieve the same thing.
As #Thomas points out, blacklisting is a pretty poor mechanism for implementing any security mechanisms. Whitelisting is a much safer approach.
But a mechanism inside the interpreter isn't particularly excellent for any number of reasons: flaws in the interpreter that are exploitable at the source code level would allow users to walk right past any mechanisms built in at that level (and the PHP team asked Linux vendors to stop calling this a security problem, because (a) they fixed one of these every week and (b) trying to confine an untrusted user-supplied script is pretty much an impossible task -- use FastCGI or similar tools for potentially untrusted scripts).
The Python interpreter is probably not designed to handle malicious input, so don't treat it as such.
If you really want to confine what untrusted users can do with Python scripts, a few pieces of advice: Do not use mod_python or anything like it. Use FastCGI or similar tools that you let specify the user account that should run the script and won't execute the script as your webserver user. And learn how to configure SELinux or AppArmor to confine what that process can do -- an hour setting up one of these tools might save you huge headaches down the road, plus you get to laugh at all the cute little exploit attempts that fail.

Writing a parallel programming framework, what have I missed?

Clarification: As per some of the comments, I should clarify that this is intended as a simple framework to allow execution of programs that are naturally parallel (so-called embarrassingly parallel programs). It isn't, and never will be, a solution for tasks which require communication or synchronisation between processes.
I've been looking for a simple process-based parallel programming environment in Python that can execute a function on multiple CPUs on a cluster, with the major criterion being that it needs to be able to execute unmodified Python code. The closest I found was Parallel Python, but pp does some pretty funky things, which can cause the code to not be executed in the correct context (with the appropriate modules imported etc).
I finally got tired of searching, so I decided to write my own. What I came up with is actually quite simple. The problem is, I'm not sure if what I've come up with is simple because I've failed to think of a lot of things. Here's what my program does:
I have a job server which hands out jobs to nodes in the cluster.
The jobs are handed out to servers listening on nodes by passing a dictionary that looks like this:
{
'moduleName':'some_module',
'funcName':'someFunction',
'localVars': {'someVar':someVal,...},
'globalVars':{'someOtherVar':someOtherVal,...},
'modulePath':'/a/path/to/a/directory',
'customPathHasPriority':aBoolean,
'args':(arg1,arg2,...),
'kwargs':{'kw1':val1, 'kw2':val2,...}
}
moduleName and funcName are mandatory, and the others are optional.
A node server takes this dictionary and does:
sys.path.append(modulePath)
globals()[moduleName]=__import__(moduleName, localVars, globalVars)
returnVal = globals()[moduleName].__dict__[funcName](*args, **kwargs)
On getting the return value, the server then sends it back to the job server which puts it into a thread-safe queue.
When the last job returns, the job server writes the output to a file and quits.
I'm sure there are niggles that need to be worked out, but is there anything obvious wrong with this approach? On first glance, it seems robust, requiring only that the nodes have access to the filesystem(s) containing the .py file and the dependencies. Using __import__ has the advantage that the code in the module is automatically run, and so the function should execute in the correct context.
Any suggestions or criticism would be greatly appreciated.
EDIT: I should mention that I've got the code-execution bit working, but the server and job server have yet to be written.
I have actually written something that probably satisfies your needs: jug. If it does not solve your problems, I promise you I'll fix any bugs you find.
The architecture is slightly different: workers all run the same code, but they effectively generate a similar dictionary and ask the central backend "has this been run?". If not, they run it (there is a locking mechanism too). The backend can simply be the filesystem if you are on an NFS system.
I myself have been tinkering with batch image manipulation across my computers, and my biggest problem was the fact that some things don't easily or natively pickle and transmit across the network.
for example: pygame's surfaces don't pickle. these I have to convert to strings by saving them in StringIO objects and then dumping it across the network.
If the data you are transmitting (eg your arguments) can be transmitted without fear, you should not have that many problems with network data.
Another thing comes to mind: what do you plan to do if a computer suddenly "disappears" while doing a task? while returning the data? do you have a plan for re-sending tasks?

Categories