For my dissertation at University, I'm working on a coding leaderboard system where users can compile / run untrusted code through temporary docker containers. The system seems to be working well so far, but one problem I'm facing is that when code for an infinite loop is submitted, E.g:
while True:
print "infinite loop"
the system goes haywire. The problem is that when I'm creating a new docker container, the Python interpreter prevents docker from killing the child container as data is still being printed to STDOUT (forever). This leads to the huge vulnerability of docker eating up all available system resources until the machine using the system completely freezes (shown below):
So my question is, is there a better way of setting a timeout on a docker container than my current method that will actually kill the docker container and make my system secure (code originally taken from here)?
#!/bin/bash
set -e
to=$1
shift
cont=$(docker run --rm "$#")
code=$(timeout "$to" docker wait "$cont" || true)
docker kill $cont &> /dev/null
echo -n 'status: '
if [ -z "$code" ]; then
echo timeout
else
echo exited: $code
fi
echo output:
# pipe to sed simply for pretty nice indentation
docker logs $cont | sed 's/^/\t/'
docker rm $cont &> /dev/null
Edit: The default timeout in my application (passed to the $to variable) is "10s" / 10 seconds.
I've tried looking into adding a timer and sys.exit() to the python source directly, but this isn't really a viable option as it seems rather insecure because the user could submit code to prevent it from executing, meaning the problem would still persist. Oh the joys of being stuck on a dissertation... :(
You could set up your container with a ulimit on the max CPU time, which will kill the looping process. A malicious user can get around this, though, if they're root inside the container.
There's another S.O. question, "Setting absolute limits on CPU for Docker containers" that describes how to limit the CPU consumption of containers. This would allow you to reduce the effect of malicious users.
I agree with Abdullah, though, that you ought to be able to docker kill the runaway from your supervisor.
If you want to run the containers without providing any protection inside them, you can use runtime constraints on resources.
In your case, -m 100M --cpu-quota 50000 might be reasonable.
That way it won't eat up the parent's system resources until you get around to killing it.
I have achieved a solution for this problem.
First you must kill docker container when time limit is achieved:
#!/bin/bash
set -e
did=$(docker run -it -d -v "/my_real_path/$1":/usercode virtual_machine ./usercode/compilerun.sh 2>> $1/error.txt)
sleep 10 && docker kill $did &> /dev/null && echo -n "timeout" >> $1/error.txt &
docker wait "$did" &> /dev/null
docker rm -f $ &> /dev/null
The container runs in detached mode (-d option), so it runs in the background.
Then you run sleep also in the background.
Then wait for the container to stop. If it doesnt stop in 10 seconds (sleep timer), the container will be killed.
As you can see, the docker run process calls a script named compilerun.sh:
#!/bin/bash
gcc -o /usercode/file /usercode/file.c 2> /usercode/error.txt && ./usercode/file < /usercode/input.txt | head -c 1M > /usercode/output.txt
maxsize=1048576
actualsize=$(wc -c <"/usercode/output.txt")
if [ $actualsize -ge $maxsize ]; then
echo -e "1MB file size limit exceeded\n\n$(cat /usercode/output.txt)" > /usercode/output.txt
fi
It starts by compiling and running a C program (its my use case, I am sure the same can be done for python compiller).
This part:
command | head -c 1M > /usercode/output.txt
Is responsible for the max output size thing. It allows output to be 1MB maximum.
After that, I just check if file is 1MB. If true, write a message inside (at the beginning of) the output file.
The --stop-timeout option is not killing the container if the timeout is exceeded.
Instead, use --ulimit --cpu=timeout to kill the container if the timeout is exceeded.
This is based on the CPU time for the process inside the container.
I guess, you can use signals in python like unix to set timeout. you can use alarm of specific time say 50 seconds and catch it. Following link might help you.
signals in python
Use --stop-timeout option while running your docker container. this will execute SIGKILL once the timeout occured
Related
I am trying to execute a Python program as a background process inside a container with kubectl as below (kubectl issued on local machine):
kubectl exec -it <container_id> -- bash -c "cd some-dir && (python xxx.py --arg1 abc &)"
When I log in to the container and check ps -ef I do not see this process running. Also, there is no output from kubectl command itself.
Is the kubectl command issued correctly?
Is there a better way to achieve the same?
How can I see the output/logs printed off the background process being run?
If I need to stop this background process after some duration, what is the best way to do this?
The nohup Wikipedia page can help; you need to redirect all three IO streams (stdout, stdin and stderr) - an example with yes:
kubectl exec pod -- bash -c "yes > /dev/null 2> /dev/null &"
nohup is not required in the above case because I did not allocate a pseudo terminal (no -t flag) and the shell was not interactive (no -i flag) so no HUP signal is sent to the yes process on session termination. See this answer for more details.
Redirecting /dev/null to stdin is not required in the above case since stdin already refers to /dev/null (you can see this by running ls -l /proc/YES_PID/fd in another shell).
To see the output you can instead redirect stdout to a file.
To stop the process you'd need to identity the PID of the process you want to stop (pgrep could be useful for this purpose) and send a fatal signal to it (kill PID for example).
If you want to stop the process after a fixed duration, timeout might be a better option.
Actually, the best way to make this kind of things is adding an entry point to your container and run execute the commands there.
Like:
entrypoint.sh:
#!/bin/bash
set -e
cd some-dir && (python xxx.py --arg1 abc &)
./somethingelse.sh
exec "$#"
You wouldn't need to go manually inside every single container and run the command.
This is my bash script used in CMD
#!/bin/bash
set -eo pipefail
echo "Setting trap"
echo $$
echo $BASHPID
trap 'cleanup' TERM
trap 'cleanup' KILL
cleanup() {
echo "Cleaning up..."
kill -TERM `jobs -p`
}
# To start the essential services
service ntp start
service awslogs start
cd /app
python -m job_manager &
wait
The Docker file is not very interesting
FROM ubuntu:16.04
RUN apt-get update --fix-missing && apt-get install -y \
git \
python \
python-pip \
ntp \
curl
ENV APP_HOME /app
RUN mkdir -p ${APP_HOME}
COPY src/ ${APP_HOME}/
# job-cmd.sh is kept here
COPY docker/helper-files/* /
CMD /job-cmd.sh
The idea is trap the TERM signal inside job-cmd.sh and then pass on to the python task.
I have tried a number of time and it did not work. After I add these call
echo $$
echo $BASHPID
I realised the pid of the CMD process is actually 7 instead of 1 as I would expect.
My questions:
1) Why the bash process is assigned PID 7?
2) How can I fix the my job script/dockerfile?
I think this is happening because you are using the shell form of the CMD instruction. From https://docs.docker.com/engine/reference/builder/#cmd:
If you want to run your command without a shell then you must express the command as a JSON array and give the full path to the executable. This array form is the preferred format of CMD.
So, replace your CMD instruction in Dockerfile with:
CMD ["/job-cmd.sh"]
Then your Bash process will be assigned PID 1. Your TERM handler will work, but you can't trap the KILL signal. From man trap:
Trapping SIGKILL or SIGSTOP is syntactically accepted by some historical implementations, but it has no effect. Portable POSIX applications cannot attempt to trap these signals.
FYI, I explained more about the PID 1 problem here: https://serverfault.com/questions/869543/bash-script-entrypoint-pid-1-kills-tail-sub-process-only-if-a-fake-trap-whi/870872#870872
You could use trap command in the bash to do this.
#!/bin/bash
#
function gracefulShutdown {
echo "Shutting down!"
# do something..
}
trap gracefulShutdown SIGTERM TERM INT
./subprocess.sh &
tail --pid=${!} -f /dev/null &
wait "${!}"
tail command just waits for subprocess to complete, while wait command waits for the tail to complete... Now, main process is the one which is waiting on.. so any docker signals directly reach the trap we set above...
Example is available at: https://github.com/iamdvr/docker-trap-subprocess
Specifically, I'm trying to use fabric to run some tests which rely on the existence of a MongoDB.
I have the following code:
db_cmd = 'mongod'
test_cmd = 'istanbul cover node_modules/mocha/bin/_mocha -- -R spec'
pid = os.spawnl(os.P_NOWAIT, db_cmd)
with shell_env(NODE_ENV='test'):
local(test_cmd)
I plan to use the PID to kill the process after the test_cmd has finished however I've not gotten that far yet.
The running of test_cmd results in an error suggesting that db_cmd has exited and that MongoDB is no longer available:
Uncaught Error: failed to connect to [localhost:27017]
However running mongod manually before running fabric causes test_cmd to run fine and interact with MongoDB.
I suspect I'm just not understanding os.spawnl. Note that this code needs to run on Windows / Linux and OSX so I think I'm somewhat restricted in which os.spawnxxx methods I can use. I'm also interested to know if there's a fabric method to achieve this as well though.
I successfully use:
os.killpg(process.pid, signal.SIGTERM)
Probably, you need to use subprocess module for that.
To run mongo in background use:
process = subprocess.Popen(
command, stdout=subprocess.PIPE, stderr=subprocess.PIPE,
shell=True, preexec_fn=os.setsid
)
To kill it after tests, use command I 've written first.
command - is a string contained your mongo start code, for example :
mongod --host localhost --port 27018
It works fine for me. If you will have problems wit the code, please let me know it.
You can also do this in straight bash with jobs and traps:
#!/bin/bash
trap "kill %1" SIGINT SIGTERM EXIT
mongod --host localhost --port 27018 &
istanbul cover node_modules/mocha/bin/_mocha -- -R spec
exit 0
What this is doing:
Set a trap on signals, SIGINT SIGTERM EXIT, to kill the first backgrond job
Make a mongod instance, and throw it into the background (the first one)
Run the tests
trigger exit signal
So this will setup and tare down your mongod instance on completion, even on a term signal or exception.
I have a python script i'd like to start on startup on an ubuntu ec2 instance but im running into troubles.
The script runs in a loop and takes care or exiting when its ready so i shouldn't need to start or stop it after its running.
I've read and tried a lot of approaches with various degrees of success and honestly im confused about whats the best approach. I've tried putting a shell script that starts the python script in /etc/init.d, making it executable and doing update-rc.d to try to get it to run but its failed at every stage.
here's the contents of the script ive tried:
#!/bin/bash
cd ~/Dropbox/Render\ Farm\ 1/appleseed/bin
while :
do
python ./watchfolder18.py -t ./appleseed.cli -u ec2 ../../data/
done
i then did
sudo chmod +x /etc/init.d/script_name
sudo sudo update-rc.d /etc/init.d/script_name defaults
This doesn't seem to run on startup and i cant see why, if i run the command manually it works as expected.
I also tried adding a line to rc.local to start the script but that doesn't seem to work either
Can anybody share what they have found is the simplest way to run a python script in the background with arguments on startup of an ec2 instance.
UPDATE: ----------------------
I've since moved this code to a file called /home/ubuntu/bin/watch_folder_start
#!/bin/bash
cd /home/ubuntu/Dropbox/Render\ Farm\ 1/appleseed/bin
while :
do
python ./watchfolder18.py -t ./appleseed.cli -u ec2 ../../data/
done
and changed my rc.local file to this:
nohup /home/ubuntu/bin/watch_folder_start &
exit 0
Which works when i manually run rc.local but wont fire on startup, i did chmod +x rc.local but that didn't change anything,
Your /etc/init.d/script_name is missing the plumbing that update-rc.d and so on use, and won't properly handle stop, start, and other init-variety commands, so...
For initial experimentation, take advantage of the /etc/init.d/rc.local script (which should be linked to by default from /etc/rc2/S99rc.local). The gets you out of having to worry about the init.d conventions and just add things to /etc/rc.local before the exit 0 at its end.
Additionally, that ~ isn't going to be defined, you'll need to use a full pathname - and furthermore the script will run as root. We'll address how to avoid this if desired in a bit. In any of these, you'll need to replace "whoeveryouare" with something more useful. Also be warned that you may need to prefix the python command with a su command and some arguments to get the process to run with the user id you might need.
You might try (in /etc/rc.local):
( if cd '/home/whoeveryouare/Dropbox/Render Farm 1/appleseed/bin' ; then
while : ; do
# This loop should respawn watchfolder18.py if it dies, but
# ideally one should fix watchfolder18.py and remove this loop.
python ./watchfolder18.py -t ./appleseed.cli -u ec2 ../../data/
done
else
echo warning: could not find watchfolder 1>&2
fi
) &
You could also put all that in a script and just call it from /etc/rc.local.
The first pass is roughly what you had, but if we assume that watchfolder18.py will arrange to avoid dying we can cut it down to:
( cd '/home/whoeveryouare/Dropbox/Render Farm 1/appleseed/bin' \
&& exec python ./watchfolder18.py -t ./appleseed.cli -u ec2 ../../data/ ) &
These aren't all that pretty, but it should let you get your daemon sorted out so you can debug it and so on, then come back to making a proper /etc/init.d or /etc/init script later. Something like this might work in /etc/init/watchfolder.conf, but I'm not yet facile enough to claim this is anything other than a rough stab at it:
# watchfolder - spawner for watchfolder18.py
description "watchfolder program"
start on runlevel [2345]
stop on runlevel [!2345]
script
if cd '/home/whoeveryouare/Dropbox/Render Farm 1/appleseed/bin' ; then
exec python ./watchfolder18.py -t ./appleseed.cli -u ec2 ../../data/0
fi
end script
I found that the best solution in the end was to use 'upstart' and create a file in etc/init called myfile.conf that contained the following
description "watch folder service"
author "Jonathan Topf"
start on startup
stop on shutdown
# Automatically Respawn:
respawn
respawn limit 99 5
script
HOST=`hostname`
chdir /home/ubuntu/Dropbox/Render\ Farm\ 1/appleseed/bin
exec /usr/bin/python ./watchfolder.py -t ./appleseed.cli -u $HOST ../../data/ >> /home/ubuntu/bin/ec2_server.log 2>&1
echo "watch_folder started"
end script
More info on using the upstart system here
http://upstart.ubuntu.com/
https://help.ubuntu.com/community/UbuntuBootupHowto
http://blog.joshsoftware.com/2012/02/14/upstart-scripts-in-ubuntu/
Context
I'm adding a few pieces to an existing, working system.
There is a control machine (a local Linux PC) running some test scripts which involve sending lots of commands to several different machines remotely via SSH. The test framework is written in Python and uses Fabric to access the different machines.
All commands are handled with a generic calling function, simplified below:
def cmd(host, cmd, args):
...
with fabric.api.settings(host_string=..., user='root', use_ssh_config=True, disable_known_hosts=True):
return fabric.api.run('%s %s' % (cmd, args))
The actual commands sent to each machine usually involve running an existing python script on the remote side. Those python scripts, do some jobs which include invoking external commands (using system and subprocess). The run() command called on the test PC will return when the remote python script is done.
At one point I needed one of those remote python scripts to launch a background task: starting an openvon server and client using openvpn --config /path/to/config.openvpn. In a normal python script I would just use &:
system('openvpn --config /path/to/config.openvpn > /var/log/openvpn.log 2>&1 &')
When this script is called remotely via Fabric, one must explicitly use nohup, dtach, screen and the likes to run the job in background. I got it working with:
system("nohup openvpn --config /path/to/config.openvpn > /var/log/openvpn.log 2>&1 < /dev/null &"
The Fabric FAQ goes into some details about this.
It works fine for certain background commands.
Problem: doesn't work for all types of background commands
This technique doesn't work for all the commands I need. In some scripts, I need to launch a background atop command (it's a top on steroids) and redirect its stdout to a file.
My code (note: using atop -P for parseable output):
system('nohup atop -P%s 1 < /dev/null | grep %s > %s 2>&1 &' % (dataset, grep_options, filename))
When the script containing that command is called remotely via Fabric, the atop process is immediately killed. The output file is generated but it's empty. Calling the same script while logged in the remote machine by SSH works fine, the atop command dumps data periodically in my output file.
Some googling and digging around brought me to interesting information about background jobs using Fabric, but my problem seems to be only specific to certains types of background jobs. I've tried:
appending sleep
running with pty=False
replacing nohup with dtach -n: same symptoms
I read about commands like top failing in Fabric with stdin redirected to /dev/null, not quite sure what to make of it. I played around with different combinations or (non-) redirects of STDIN, STDOUT and STDERR
Looks like I'm running out of ideas.
Fabric seems overkill for what we are doing. We don't even use the "fabfile" method because it's integrated in a nose framework and I run them invoking nosetests. Maybe I should resort to dropping Fabric in favor of manual SSH commands, although I don't like the idea of changing a working system because of it not supporting one of my newer modules.
In my environment, looks like it is working
from fabric.api import sudo
def atop():
sudo('nohup atop -Pcpu 1 </dev/null '
'| grep cpu > /tmp/log --line-buffered 2>&1 &',
pty=False)
result:
fabric:~$ fab atop -H web01
>>>[web01] Executing task 'atop'
>>>[web01] sudo: nohup atop -Pcpu 1 </dev/null | grep cpu > /tmp/log --line-buffered 2>&1 &
>>>
>>>Done.
web01:~$ cat /tmp/log
>>>cpu web01 1374246222 2013/07/20 00:03:42 361905 100 0 5486 6968 0 9344927 3146 0 302 555 0 2494 100
>>>cpu web01 1374246223 2013/07/20 00:03:43 1 100 0 1 0 0 99 0 0 0 0 0 2494 100
>>>cpu web01 1374246224 2013/07/20 00:03:44 1 100 0 1 0 0 99 0 0 0 0 0 2494 100
...
The atop command may need the super user. This doesn't work
from fabric.api import run
def atop():
run('nohup atop -Pcpu 1 </dev/null '
'| grep cpu > /tmp/log --line-buffered 2>&1 &',
pty=False)
On the other hand this work.
from fabric.api import run
def atop():
run('sudo nohup atop -Pcpu 1 </dev/null '
'| grep cpu > /tmp/log --line-buffered 2>&1 &',
pty=False)