How can I safely run untrusted python code?

How can I safely run untrusted python code? - python

Here is the scenario, my website has some unsafe code, which is generated by website users, to run on my server.
I want to disable some reserved words for python to protect my running environment, such as eval, exec, print and so on.
Is there a simple way (without changing the python interpreter, my python version is 2.7.10) to implement the feature I described before?
Many thanks.

Disabling names on python level won't help as there are numerous ways around it. See this and this post for more info. This is what you need to do:
For CPython, use RestrictedPython to define a restricted subset of Python.
For PyPy, use sandboxing. It allows you to run arbitrary python code in a special environment that serializes all input/output so you can check it and decide which commands are allowed before actually running them.
Since version 3.8 Python supports audit hooks so you can completely prevent certain actions:
import sys
def audit(event, args):
if event == 'compile':
sys.exit('nice try!')
sys.addaudithook(audit)
eval('5')
Additionally, to protect your host OS, use
either virtualization (safer) such as KVM or VirtualBox
or containerization (much lighter) such as lxd or docker
In the case of containerization with docker you may need to add AppArmor or SELinux policies for extra safety. lxd already comes with AppArmor policies by default.
Make sure you run the code as a user with as little privileges as possible.
Rebuild the virtual machine/container for each user.
Whichever solution you use, don't forget to limit resource usage (RAM, CPU, storage, network). Use cgroups if your chosen virtualization/containerization solution does not support these kinds of limits.
Last but not least, use timeouts to prevent your users' code from running forever.

One way is to shadow the methods:
def not_available(*args, **kwargs):
return 'Not allowed'
eval = not_available
exec = not_available
print = not_available
However, someone smart can always do this:
import builtins
builtins.print('this works!')
So the real solution is to parse the code and not allow the input if it has such statements (rather than trying to disable them).

Related

Safely run executable in Node

I found myself having to implement the following use case: I need to run a webapp in which users can submit C programs, which need to be run safely on my backend.
I'm trying to get this done using Node. In the past, I had to do something similar but the user-submitted code was JavaScript code, and I got away with using Node vm2 module. Essentially, I would create a VM and call its run method with the user submitted code as a string argument, then collect the output and do whatever I had to.
I'm trying to understand if using the same moule could help me with C code as well. The idea would be to use exec to first call gcc and compile the user code. Afterwards, I would use a VM to run exec again, this time passing the generated executable as a result. Would this be safe?
I don't understand vm2 deeply enough to know whether the safety is only limited to executing JS code or if it can be trusted to also run any arbitrary shell command safely.
In case vm2 isn't appropriate, what would be another way to run an executable in a sandboxed fashion in Node? Feel free to also suggest Python-based solutions, if you know any. Please note that the code will still be executed in a separate container as the main app regardless, but I want to make extra sure users cannot easily just tear it down at their liking.
Thank you in advance.

I am currently experiencing the same challenge as you, trying to execute safely some untrusted code using spawn, so what I can tell you is that vm2 only works for JS/TS code, but can't control what happens to a new process created by spawn, fork or exec.
For now I haven't found any good solution, but I'm thinking of trying to run the process as a user with limited rights.
As you seem to have access to the C source code, I would advise you to search how to run untrusted C programs (in plain C), and see if you can manipulate the C code in order to have a safer environment from this point of view.

Pass root privilege to "os" commands in Python

I am adding functionality to a PyQt5 application. This new functionality involves copying, linking and removing files (and links) that may be in protected directories, so commands like os.symlink or shutil.copyfile would fail.
Of course the main application is not run with root privileges (and asking users to do so is out of question), so I need a workaround.
First thing is of course to wrap the critical code in try/except blocks and investigate any exceptions. If it turns out missing root privileges are the issue I would ask for the password in a dialog (presumably storing the password for as long as the current dialog is alive).
But I'm not sure how I can repeat the step with the root password. I would strongly prefer doing it in the Python domain (or does Qt provide some support for file operations? I'd bet it does but I couldn't find it). I think it should be possible by doing the file operations in a shell command and somehow pass the password to that, but since Python and PyQt were designed to shield the programmer from the intricacies of OS differences I would prefer to avoid that route.
Some pseudocode should give a clear idea of the question:
def my_copy(source, dest):
try:
os.path.symlink(source, dest)
except: # check for permission problem:
# use dialog to ask for password
# repeat the symlink procedure with password

What you're trying to do here is basically impossible on most modern operating systems, and for good reason.
Imagine a typical macOS or Windows user who expects to be able to auth by using the fingerprint reader instead of typing their password. Or a blind user. Or someone who's justifiably paranoid about your app storing their password in plaintext in a Python string in non-kernel memory (not to mention in a Windows dialog box whose events can be hooked by any process). This is why modern platforms come with a framework like libPAM/XSSO, Authorization Services. etc.
And the way privilege escalation works is so different between Windows and POSIX, or even between macOS and Linux, not to mention so rapidly evolving, that, as far as I know, there's no cross-platform framework to do it.
In fact, most systems discourage app-driven privilege escalation in the first place. On Windows, you often ask the OS to run a helper app with elevated privileges (and the OS will then apply the appropriate policy and decide what to ask for). On macOS, you usually write a LaunchServices daemon that you get permission for at install time (using special installer APIs) rather than runtime. For traditional non-server-y POSIX apps, you'd usually do something similar, but with a setuid helper that you can create at install time just because the installation runs as root. For traditional POSIX servers, you often start as root then drop privs after forking and execing a similar helper daemon.
If all of this seems like way more work than you wanted to deal with… well, honestly, that was probably the intention. OS designers don't want app designers introducing security holes, so they make sure you have to understand what the platform wants and how to work with it rather than against it before you can even try to do things like moving around files you don't have permissions for.

Wrap the try/except code in a loop with a counter:
def my_copy(source, dest):
for attempts in [1, 2]:
try:
os.path.symlink(source, dest)
# we succeeded, so don't try any more
break
except: # check for permission problem:
if attempts == 1:
# use dialog to ask for password
# repeat the symlink procedure with password
else:
# we already tried as root, and failed again
break

How can I access Ring 0 with Python?

This answer, stating that the naming of classes in Python is not done because of special privileges, here confuses me.
How can I access lower rings in Python?
Is the low-level io for accessing lower level rings?
If it is, which rings I can access with that?
Is the statement "This function is intended for low-level I/O." referring to lower level rings or to something else?
C tends to be prominent language in os -programming. When there is the OS -class in Python, does it mean that I can access C -code through that class?
Suppose I am playing with bizarre machine-language code and I want to somehow understand what it means. Are there some tools in Python which I can use to analyze such things? If there is not, is there some way that I could still use Python to control some tool which controls the bizarre machine language? [ctypes suggested in comments]
If Python has nothing to do with the low-level privileged stuff, do it still offers some wrappers to control the privileged?

Windows and Linux both use ring 0 for kernel code and ring 3 for user processes. The advantage of this is that user processes can be isolated from one another, so the system continues to run even if a process crashes. By contrast, a bug in ring 0 code can potentially crash the entire machine.
One of the reasons ring 0 code is so critical is that it can access hardware directly. By contrast, when a user-mode (ring 3) process needs to read some data from a disk:
the process executes a special instruction telling the CPU it wants to make a system call
CPU switches to ring 0 and starts executing kernel code
kernel checks that the process is allowed to perform the operation
if permitted, the operation is carried out
kernel tells the CPU it has finished
CPU switches back to ring 3 and returns control to the process
Processes belonging to "privileged" users (e.g. root/Administrator) run in ring 3 just like any other user-mode code; the only difference is that the check at step 3 always succeeds. This is a good thing because:
root-owned processes can crash without taking the entire system down
many user-mode features are unavailable in the kernel, e.g. swappable memory, private address space
As for running Python code in lower rings - kernel-mode is a very different environment, and the Python interpreter simply isn't designed to run in it, e.g. the procedure for allocating memory is completely different.
In the other question you reference, both os.open() and open() end up making the open() system call, which checks whether the process is allowed to open the corresponding file and performs the actual operation.

I think SimonJ's answer is very good, but I'm going to post my own because from your comments it appears you're not quite understanding things.
Firstly, when you boot an operating system, what you're doing is loading the kernel into memory and saying "start executing at address X". The kernel, that code, is essentially just a program, but of course nothing else is loaded, so if it wants to do anything it has to know the exact commands for the specific hardware it has attached to it.
You don't have to run a kernel. If you know how to control all the attached hardware, you don't need one, in fact. However, it was rapidly realised way back when that there are many types of hardware one might face and having an identical interface across systems to program against would make code portable and generally help get things done faster.
So the function of the kernel, then, is to control all the hardware attached to the system and present it in a common interface, called an API (application programming interface). Code for programs that run on the system don't talk directly to hardware. They talk to the kernel. So user land programs don't need to know how to ask a specific hard disk to read sector 0x213E or whatever, but the kernel does.
Now, the description of ring 3 provided in SimonJ's answer is how userland is implemented - with isolated, unprivileged processes with virtual private address spaces that cannot interfere with each other, for the benefits he describes.
There's also another level of complexity in here, namely the concept of permissions. Most operating systems have some form of access control, whereby "administrators" have total control of the system and "users" have a restricted subset of options. So a kernel request to open a file belonging to an administrator should fail under this sort of approach. The user who runs the program forms part of the program's context, if you like, and what the program can do is constrained by what that user can do.
Most of what you could ever want to achieve (unless your intention is to write a kernel) can be done in userland as the root/administrator user, where the kernel does not deny any API requests made to it. It's still a userland program. It's still a ring 3 program. But for most (nearly all) uses it is sufficient. A lot can be achieved as a non-root/administrative user.
That applies to the python interpreter and by extension all python code running on that interpreter.
Let's deal with some uncertainties:
The naming of os and sys I think is because these are "systems" tasks (as opposed to say urllib2). They give you ways to manipulate and open files, for example. However, these go through the python interpreter which in turn makes a call to the kernel.
I do not know of any kernel-mode python implementations. Therefore to my knowledge there is no way to write code in python that will run in the kernel (linux/windows).
There are two types of privileged: privileged in terms of hardware access and privileged in terms of the access control system provided by the kernel. Python can be run as root/an administrator (indeed on Linux many of the administration gui tools are written in python), so in a sense it can access privileged code.
Writing a C extension or controlling a C application to Python would ostensibly mean you are either using code added to the interpreter (userland) or controlling another userland application. However, if you wrote a kernel module in C (Linux) or a Driver in C (Windows) it would be possible to load that driver and interact with it via the kernel APIs from python. An example might be creating a /proc entry in C and then having your python application pass messages via read/write to that /proc entry (which the kernel module would have to handle via a write/read handler. Essentially, you write the code you want to run in kernel space and basically add/extend the kernel API in one of many ways so that your program can interact with that code.
"Low-level" IO means having more control over the type of IO that takes place and how you get that data from the operating system. It is low level compared to higher level functions still in Python that give you easier ways to read files (convenience at the cost of control). It is comparable to the difference between read() calls and fread() or fscanf() in C.
Health warning: Writing kernel modules, if you get it wrong, will at best result in that module not being properly loaded; at worst your system will panic/bluescreen and you'll have to reboot.
The final point about machine instructions I cannot answer here. It's a totally separate question and it depends. There are many tools capable of analysing code like that I'm sure, but I'm not a reverse engineer. However, I do know that many of these tools (gdb, valgrind) e.g. tools that hook into binary code do not need kernel modules to do their work.

You can use inpout library http://logix4u.net/parallel-port/index.php
import ctypes
#Example of strobing data out with nStrobe pin (note - inverted)
#Get 50kbaud without the read, 30kbaud with
read = []
for n in range(4):
ctypes.windll.inpout32.Out32(0x37a, 1)
ctypes.windll.inpout32.Out32(0x378, n)
read.append(ctypes.windll.inpout32.Inp32(0x378)) #Dummy read to see what is going on
ctypes.windll.inpout32.Out32(0x37a, 0)
print read

[note: I was wrong. usermode code can no longer access ring 0 on modern unix systems. -- jc 2019-01-17]
I've forgotten what little I ever knew about Windows privileges. In all Unix systems with which I'm familiar, the root user can access all ring0 privileges. But I can't think of any mapping of Python modules with privilege rings.
That is, the 'os' and 'sys' modules don't give you any special privileges. You have them, or not, due to your login credentials.

How can I access lower rings in Python?
ctypes
Is the low-level io for accessing lower level rings?
No.
Is the statement "This function is intended for low-level I/O." referring to lower level rings or to something else?
Something else.
C tends to be prominent language in os -programming. When there is the OS -class in Python, does it mean that I can access C -code through that class?
All of CPython is implemented in C.
The os module (it's not a class, it's a module) is for accessing OS API's. C has nothing to do with access to OS API's. Python accesses the API's "directly".
Suppose I am playing with bizarre machine-language code and I want to somehow understand what it means. Are there some tools in Python which I can use to analyze such things?
"playing with"?
"understand what it means"? is your problem. You read the code, you understand it. Whether or not Python can help is impossible to say. What don't you understand?
If there is not, is there some way that I could still use Python to control some tool which controls the bizarre machine language? [ctypes suggested in comments]
ctypes
If Python has nothing to do with the low-level privileged stuff, do it still offers some wrappers to control the privileged?
You don't "wrap" things to control privileges.
Most OS's work like this.
You grant privileges to a user account.
The OS API's check the privileges granted to the user making the OS API request.
If the user has the privileges, the OS API works.
If the user lacks the privileges, the OS API raises an exception.
That's all there is to it.

setfsuid() and python 2.5.4

I'm trying to use setfsuid() with python 2.5.4 and RHEL 5.4.
Since it's not included in the os module, I wrapped it in a C module of my own and installed it as a python extension module using distutils.
However when I try to use it I don't get the expected result.
setfsuid() returns value indicating success (changing from a superuser), but I can't access files to which only the newly set user should have user access (using open()), indicating that fsuid was not truely changed.
I tried to verify setfsuid() worked, by running it consecutively twice with the same user input
The result was as if nothing had changed, and on every call the returned value was of old user id different from the new one. I also called getpid() from the module, and from the python script, both returned the same id. so this is not the problem.
Just in case it's significant, I should note that I'm doing all of this from within an Apache daemon process (WSGI).
Anyone can provide an explanation to that?
Thank you

The ability to change the FSUID is limited to either root or non-root processes with the CAP_SETFCAP capability. These days it's usually considered bad practice to run a webserver with root permissions so, most likely, you'll need to set the capability on the file server (see man capabilities for details). Please note that doing this could severly affect your overall system's security. I'd recommend considering spawning a small backend process that runs as root and converses with your WSGI app via a local UNIX socket prior to mucking with the security of a high-profile target like Apache.

Execute arbitrary python code remotely - can it be done?

I'm working on a grid system which has a number of very powerful computers. These can be used to execute python functions very quickly. My users have a number of python functions which take a long time to calculate on workstations, ideally they would like to be able to call some functions on a remote powerful server, but have it appear to be running locally.
Python has an old function called "apply" - it's mostly useless these days now that python supports the extended-call syntax (e.g. **arguments), however I need to implement something that works a bit like this:
rapply = Rapply( server_hostname ) # Set up a connection
result = rapply( fn, args, kwargs ) # Remotely call the function
assert result == fn( *args, **kwargs ) #Just as a test, verify that it has the expected value.
Rapply should be a class which can be used to remotely execute some arbitrary code (fn could be literally anything) on a remote server. It will send back the result which the rapply function will return. The "result" should have the same value as if I had called the function locally.
Now let's suppose that fn is a user-provided function I need some way of sending it over the wire to the execution server. If I could guarantee that fn was always something simple it could could just be a string containing python source code... but what if it were not so simple?
What if fn might have local dependencies: It could be a simple function which uses a class defined in a different module, is there a way of encapsulating fn and everything that fn requires which is not standard-library? An ideal solution would not require the users of this system to have much knowledge about python development. They simply want to write their function and call it.
Just to clarify, I'm not interested in discussing what kind of network protocol might be used to implement the communication between the client & server. My problem is how to encapsulate a function and its dependencies as a single object which can be serialized and remotely executed.
I'm also not interested in the security implications of running arbitrary code on remote servers - let's just say that this system is intended purely for research and it is within a heavily firewalled environment.

Take a look at PyRO (Python Remote objects) It has the ability to set up services on all the computers in your cluster, and invoke them directly, or indirectly through a name server and a publish-subscribe mechanism.

It sounds like you want to do the following.
Define a shared filesystem space.
Put ALL your python source in this shared filesystem space.
Define simple agents or servers that will "execfile" a block of code.
Your client then contacts the agent (REST protocol with POST methods works well for
this) with the block of code.
The agent saves the block of code and does an execfile on that block of code.
Since all agents share a common filesystem, they all have the same Python library structure.
We do with with a simple WSGI application we call "batch server". We have RESTful protocol for creating and checking on remote requests.

Stackless had ability to pickle and unpickle running code. Unfortunately current implementation doesn't support this feature.

You could use a ready-made clustering solution like Parallel Python. You can relatively easily set up multiple remote slaves and run arbitrary code on them.

You could use a SSH connection to the remote PC and run the commands on the other machine directly. You could even copy the python code to the machine and execute it.

Syntax:
cat ./test.py | sshpass -p 'password' ssh user#remote-ip "python - script-arguments-if-any for test.py script"
1) here "test.py" is the local python script.
2) sshpass used to pass the ssh password to ssh connection

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.