Arbitrary Code Execution with Docker

Arbitrary Code Execution with Docker - python

I'm thinking about building a web app that would involve users writing small segments of python and the server testing that code. However, this presents a ton of security concerns. Would docker be a good isolation tool for running this potentially malicious code? From what I've read, checking system calls with ptrace is a possibility, but I would prefer to use a preexisting tool.

Docker is indeed very suitable for this kind of usage. However, please note that docker is NOT yet ready for production usage.
I would recommend to create a new container and give non-root privileges to your users to this container. One container per user.
This way, you can prepare your docker image and prepare the environment and control precisely what your users are doing :)

Related

Can a website's controlling Python code be viewed?

I am trying to place a simple Flask app within a Docker container to be hosted on Firebase as per David East's article on https://medium.com/firebase-developers/hosting-flask-servers-on-firebase-from-scratch-c97cfb204579
Within the app, I have used Flask email to send emails automatically. Is it safe to leave the password as a string in the Python code?

It's extremely unsafe. The password shouldn't be in the code at all. Rotate the password immediately if you're concerned it might be compromised.
There are two important details about Docker that matter here. The first is that it's very easy to get content out of an image, especially if it's in an interpreted language like Python; an interested party can almost certainly docker run --rm -it --entrypoint sh your-image to get an interactive shell to poke around, and it's impossible to prevent this. The other is that it's basically trivial to use Docker to root the host – docker run --rm -it -v /:/host busybox sh can read and write any host file as root, including the internal Docker storage – and so there is a fairly high level of trust involved.
Including passwords in code at all is usually a mistake, and it's something most security scans will flag. If it's included in your code then it's probably checked into source control unencrypted, which also is a security issue. It being embedded in the code also probably makes it harder to change since the system operator won't have access to the code.
In a Docker context, often the best way to pass a credential is through a docker run -e environment variable; your Python code would see it in the os.environ dictionary. Passing it via a file that is not checked in to source control is arguably more secure, but also more complex, and I don't think the security gain is significant.

How to trace code in a Heroku production system

My Python/Django code behaves different in the heroku production code, than on my development machine.
I would like to debug/trace it.
Since it runs on Heroku. AFAIK I can't insert import pydevd_pycharm; pydevd_pycharm.settrace(... into the code.
I use PyCharm.
But I don't need a fancy GUI. A command-line tool would be fine, too.
I would be happy if I could see all lines which get executed during a particular http request.
How to solve this for production systems?

In order to understand the difference between your local development environment and the Heroku production I would first deploy the application on another Heroku Dyno, for example a Free Dyno which you can easily create and manage.
You can then integrate the tools you want and add the log statements as needed.
Even if you are able to debug/inspect the production runtime it is very important to be able to test on production-like systems to capture problems early and investigate problems without guessing.
On the Prod system you have limited options to debug the application:
consider code changes (i.e. add logging stamements) but as you have pointed out this involves PRs and a new release
debugger: connect your favourite debugger (i.e. PyCharm) to the remote application. This is something that (almost) no one does (given the security aspects and the likely impact on the application performance) and I doubt your system admins/DevOps would agree

I don't know of any tool which can do that, but you shouldn't run into this problem very often. So I wouldn't bother trying to solve this generally, but instead just add logging statements where you think they can be handy to debug this one problem.

Is distributing python source code in Docker secure?

I am about to decide on programming language for the project.
The requirements are that some of customers want to run application on isolated servers without external internet access.
To do that I need to distribute application to them and cannot use SaaS approach running on, for example, my cloud (what I'd prefer to do...).
The problem is that if I decide to use Python for developing this, I would need to provide customer with easy readable code which is not really what I'd like to do (of course, I know about all that "do you really need to protect your source code" kind of questions but it's out of scope for now).
One of my colleagues told me about Docker. I can find dozen of answers about Docker container security. Problem is all that is about protecting (isolating) host from code running in container.
What I need is to know if the Python source code in the Docker Image and running in Docker Container is secured from access - can user in some way (doesn't need to be easy) access that Python code?
I know I can't protect everything, I know it is possible to decompile/crack everything. I just want to know the answer just to decide whether the way to access my code inside Docker is hard enough that I can take the risk.

Docker images are an open and documented "application packaging" format. There are countless ways to inspect the image contents, including all of the python source code shipped inside of them.
Running applications inside of a container provides isolation from the application escaping the container to access the host. They do not protect you from users on the host inspecting what is occurring inside of the container.

Python programs are distributed as source code. If it can run on a client machine, then the code is readable on that machine. A docker container only contains the application and its libraries, external binaries and files, not a full OS. As the security can only be managed at OS level (or through encryption) and as the OS is under client control, the client can read any file on the docker container, including your Python source.
If you really want to go that way, you should consider providing a full Virtual Machine to your client. In that case, the VM contains a full OS with its account based security (administrative account passwords on the VM can be different from those of the host). Is is far from still waters, because it means that the client will be enable to setup or adapt networking on the VM among other problems...
And you should be aware the the client security officer could emit a strong NO when it comes to running a non controlled VM on their network. I would never accept it.
Anyway, as the client has full access to the VM, really securing it will be hard if ever possible (disable booting from an additional device may even not be possible). It is admitted in security that if the attacker has physical access, you have lost.
TL/DR: It in not the expected answer but just don't. It you sell your solution you will have a legal contract with your customer, and that kind of problem should be handled at a legal level, not a technical one. You can try, and I have even given you a hint, but IMHO the risks are higher than the gain.

I know that´s been more than 3 years, but... looking for the same kind of solution I think that including compiled python code -not your source code- inside the container would be a challenging trial for someone trying to access your valuable source code.
If you run pyinstaller --onefile yourscript.py you will get a compiled single file that can be run as an executable. I have only tested it in Raspberry, but as far as I know it´s the same for, say, Windows.
Of course anything can be reverse engineered, but hopefully it won´t be worth the effort to the regular end user.

I think it could be a solution as using a "container" to protect our code from the person we wouldn't let them access. the problem is docker is not a secure container. As the root of the host machine has the most powerful control of the Docker container, we don't have any method to protect the root from accessing inside of the container.
I just have some ideas about a secure container:
Build a container with init file like docker file, a password must be set when the container is created;
once the container is built, we have to use a password to access inside, including
reading\copy\modify files
all the files stored on the host machine should be encypt。
no "retrieve password" or “--skip-grant-” mode is offered. that means nobody can
access the data inside the container if u lost the password.
If we have a trustable container where we can run tomcat or Django server, code obfuscation will not be necessary.

Why not run uwsgi instances as root

I am reading through the uWSGI documentation and it warns to always avoid running your uWSGI instances as root. What is the reason behind this?
Does it matter if it is the only process (besides nginx) running in a docker container, serving up a flask application?

In general, security reasoning says that running as root as bad. If there were any kind of bug, for example a code execution bug that can allow anybody to execute arbitrary code they would be able to destroy your entire system.
If you don't run the process as root, any code execution vulnerabilities would need to be paired with a secondary privilege escalation vulnerability in order to destroy your system.
In a docker container, this is mitigated slightly in that you'll be able to recover your old system relatively easily, however, it is generally still a bad practice or habit to allow processes to run as root as a malicious attacker can and will steal the information that may exist on your server or turn your server into a malware delivery mechanism.

Encrypted and secure docker containers

We all know situations when you cannot go open source and freely distribute software - and I am in one of these situations.
I have an app that consists of a number of binaries (compiled from C sources) and Python code that wraps it all into a system. This app used to work as a cloud solution so users had access to app functions via network but no chance to touch the actual server where binaries and code are stored.
Now we want to deliver the "local" version of our system. The app will be running on PCs that our users will physically own. We know that everything could be broken, but at least want to protect the app from possible copying and reverse-engineering as much as possible.
I know that Docker is a wonderful deployment tool so I wonder: is it possible to create encrypted Docker containers where no one can see any data stored in the container's filesystem? Is there a known solution to this problem?
Also, maybe there are well known solutions not based on Docker?

The root user on the host machine (where the docker daemon runs) has full access to all the processes running on the host. That means the person who controls the host machine can always get access to the RAM of the application as well as the file system. That makes it impossible to hide a key for decrypting the file system or protecting RAM from debugging.
Using obfuscation on a standard Linux box, you can make it harder to read the file system and RAM, but you can't make it impossible or the container cannot run.
If you can control the hardware running the operating system, then you might want to look at the Trusted Platform Module which starts system verification as soon as the system boots. You could then theoretically do things before the root user has access to the system to hide keys and strongly encrypt file systems. Even then, given physical access to the machine, a determined attacker can always get the decrypted data.

What you are asking about is called obfuscation. It has nothing to do with Docker and is a very language-specific problem; for data you can always do whatever mangling you want, but while you can hope to discourage the attacker it will never be secure. Even state-of-the-art encryption schemes can't help since the program (which you provide) has to contain the key.
C is usually hard enough to reverse engineer, for Python you can try pyobfuscate and similar.
For data, I found this question (keywords: encrypting files game).

If you want a completely secure solution, you're searching for the 'holy grail' of confidentiality: homomorphous encryption. In short, you want to encrypt your application and data, send them to a PC, and have this PC run them without its owner, OS, or anyone else being able to scoop at the data.
Doing so without a massive performance penalty is an active research project. There has been at least one project having managed this, but it still has limitations:
It's windows-only
The CPU has access to the key (ie, you have to trust Intel)
It's optimised for cloud scenarios. If you want to install this to multiple PCs, you need to provide the key in a secure way (ie just go there and type it yourself) to one of the PCs you're going to install your application, and this PC should be able to securely propagate the key to the other PCs.
Andy's suggestion on using the TPM has similar implications to points 2 and 3.

Sounds like Docker is not the right tool, because it was never intended to be used as a full-blown sandbox (at least based on what I've been reading). Why aren't you using a more full-blown VirtualBox approach? At least then you're able to lock up the virtual machine behind logins (as much as a physical installation on someone else's computer can be locked up) and run it isolated, encrypted filesystems and the whole nine yards.
You can either go lightweight and open, or fat and closed. I don't know that there's a "lightweight and closed" option.

I have exactly the same problem. Currently what I was able to discover is bellow.
A. Asylo(https://asylo.dev)
Asylo requires programs/algorithms to be written in C++.
Asylo library is integrated in docker and it seems to be feаsable to create custom dоcker image based on Asylo .
Asylo depends on many not so popular technologies like "proto buffers" and "bazel" etc. To me it seems that learning curve will be steep i.e. the person who is creating docker images/(programs) will need a lot of time to understand how to do it.
Asylo is free of charge
Asylo is bright new with all the advantages and disadvantages of being that.
Asylo is produced by Google but it is NOT an officially supported Google product according to the disclaimer on its page.
Asylo promises that data in trusted environment could be saved even from user with root privileges. However, there is lack of documentation and currently it is not clear how this could be implemented.
B. Scone(https://sconedocs.github.io)
It is binded to INTEL SGX technology but also there is Simulation mode(for development).
It is not free. It has just a small set of functionalities which are not paid.
Seems to support a lot of security functionalities.
Easy for use.
They seems to have more documentation and instructions how to build your own docker image with their technology.

For the Python part, you might consider using Pyinstaller, with appropriate options, it can pack your whole python app in a single executable file, which will not require python installation to be run by end users. It effectively runs a python interpreter on the packaged code, but it has a cipher option, which allows you to encrypt the bytecode.
Yes, the key will be somewhere around the executable, and a very savvy costumer might have the means to extract it, thus unraveling a not so readable code. It's up to you to know if your code contains some big secret you need to hide at all costs. I would probably not do it if I wanted to charge big money for any bug solving in the deployed product. I could use it if client has good compliance standards and is not a potential competitor, nor is expected to pay for more licenses.
While I've done this once, I honestly would avoid doing it again.
Regarding the C code, if you can compile it into executables and/or shared libraries can be included in the executable generated by Pyinstaller.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.