I'm trying to stress test several servers that I can ssh into. I'm trying to write a python script that causes a reboot loop for N times. I call
os.system('reboot')
But, I'm not sure how to have the script continue execution once the server has finished booting to continue execution. The servers do run various distros of Linux. Any help would be great.
You mentioned that the solution doesn't have to be in Python, so you can just use a Bash script for this (given that you can ping the server):
#!/usr/bin/env bash
COUNTER=$1
SERVER=$2
COMMAND="sudo reboot"
SLEEP_DURATION=60
echo "Working on $SERVER $COUNTER times"
while (( $COUNTER > 0 )); do
ping -c 1 -t 5 $SERVER;
_ping_r=$?
if (( $_ping_r < 1 )); then
echo "Rebooting $SERVER"
ssh $SERVER $COMMAND;
let COUNTER=COUNTER-1
else
echo "Couldn't ping $SERVER. Taking a quick nap and trying again."
sleep 5
fi
sleep $SLEEP_DURATION;
done
echo "Done working on $SERVER"
Save it in something like command_runner.sh and simply call it via ./command_runner.sh 2 server.example.org on a workstation that can SSH and run reboot on the server.
You could use Fabric to ssh into several servers in parallel and to execute various commands there (even those commands that require reboot -- you might need to disconnect all servers explicitly in your fabfile.py after such commands).
Related
I have an issue with executing application via /usr/bin/timeout in a bash script.
In this specific case this is a simple python fabric script (fabric version 1.14)
In order to install this version of fabric library run: pip install "fabric<2"
There is no reproduction with new fabric 2.x.
Shell script causing issue:
[root#testhost:~ ] $ cat testNOK.sh
#!/bin/bash
timeout 10 ./test.py
echo "RETCODE=$?"
[root#testhost:~ ] $ ./testNOK.sh
[localhost] run: echo Hello!
RETCODE=124
[root#testhost:~ ] $
Similar script (without timeout) working fine
[root#testhost:~ ] $ cat testOK.sh
#!/bin/bash
./test.py
echo "RETCODE=$?"
[root#testhost:~ ] $ ./testOK.sh
[localhost] run: echo Hello!
[localhost] out: Hello!
[localhost] out:
RETCODE=0
[root#testhost:~ ] $
Manual execution from bash commandline with timeout working fine:
[root#testhost:~ ] $ timeout 10 ./test.py && echo "RETCODE=$?"
[localhost] run: echo Hello!
[localhost] out: Hello!
[localhost] out:
RETCODE=0
[root#testhost:~ ] $
Python2.7 test.py script
[root#testhost:~ ] $ cat test.py
#!/usr/bin/python
from fabric.api import run, settings
with settings(host_string='localhost', user='root', password='XXXXX'):
run('echo Hello!')
[root#testhost:~ ] $
I have observed the same behavior on different Linux distributions.
Now the question is why application executed via timeout within bash script behaves in a different way and what would be the best solution to this issue?
You need to invoke timeout with the --foreground option:
timeout --foreground ./test.py
This is only required if the timeout command is not executed from an interactive shell (that is, if it's executed from a script file).
Quoting from the timeout info page:
‘--foreground’
Don’t create a separate background program group, so that the
managed COMMAND can use the foreground TTY normally. This is
needed to support timing out commands not started directly from an
interactive shell, in two situations.
1. COMMAND is interactive and needs to read from the terminal for
example
2. the user wants to support sending signals directly to COMMAND
from the terminal (like Ctrl-C for example)
What's actually going on in this case is that fabric (or something invokes) is calling tcsetattr to turn terminal echo off. I don't know why, but I suppose it has something to do with the process used to (not) collect the user password. (I just saw it in an strace; I made no attempt to find the call.) Attempting to change tty configuration from a background process will cause the process to block until it regains control of the tty, and that's what's happening.
It doesn't happen when timeout is not used because bash doesn't create a background program group. I suppose that fabric 2 avoids the call to tcsetattr.
You could probably also avoid the issue by avoiding password-based SSH authentication but I didn't try that.
You can also avoid the problem by redirecting stdin to /dev/null (either in the timeout command or in the invocation of the shell script.) If you don't need to forward stdin to the remote command (and you probably don't), that might also be useful.
You Can Use time out without using bash Just by using the time model in python
import time
time.sleep(5)
#change the 5 by the seconds that you need to set a timeout
Overview
I'm trying to use python fabric to run an ssh command as root on a remote server.
The command: nohup ./foo &
foo is expected to command run for several days. I must be able to disassociate foo from fabric's remote ssh session, and put foo in the background.
The Fabric FAQ says you should use something like screen or tmux when you run your fabric script (which runs the backgrounded command). I tried that, but my fabric script still hung. foo is not hanging.
Question
How do I use fabric to run this command on a remote server without the script hanging: nohup ./foo &
Details
This is my script:
#!/bin/sh
# Credit: https://unix.stackexchange.com/a/20895/6766
if "true" : '''\'
then
exec "/nfs/it/network_python/$OSREL/bin/python" "$0" "$#"
exit 127
fi
'''
from getpass import getpass
import os
from fabric import Connection, Config
assert os.geteuid()==0, "ERROR: Must run as root"
for host in ['host1.foo.local', 'host2.foo.local']:
# Make an ssh connection to the host...
conn = Connection(host)
# The script always hangs at this line
result = conn.run('nohup ./foo &', warn=True, hide=True)
I always open a tmux session to run the aforementioned script in; even doing so, the script hangs when I get to conn.run(), above.
I'm running the script on a vanilla CentOS 6.5 VM; it runs under python 2.7.10 and fabric 2.1.
The Fabric FAQ is unclear... I thought the FAQ wanted tmux used on the local side when I executed the Fabric script.
The correct way to fix this problem is to replace nohup in the remote command, with screen -d -m <command>. Now I can run the whole script locally with no hangs (and I don't have to use tmux in the local term).
Explicitly, I have to rewrite the last line of my script in my question as:
# Remove &, and nohup...
result = conn.run('screen -d -m ./foo', warn=True, hide=True)
For my dissertation at University, I'm working on a coding leaderboard system where users can compile / run untrusted code through temporary docker containers. The system seems to be working well so far, but one problem I'm facing is that when code for an infinite loop is submitted, E.g:
while True:
print "infinite loop"
the system goes haywire. The problem is that when I'm creating a new docker container, the Python interpreter prevents docker from killing the child container as data is still being printed to STDOUT (forever). This leads to the huge vulnerability of docker eating up all available system resources until the machine using the system completely freezes (shown below):
So my question is, is there a better way of setting a timeout on a docker container than my current method that will actually kill the docker container and make my system secure (code originally taken from here)?
#!/bin/bash
set -e
to=$1
shift
cont=$(docker run --rm "$#")
code=$(timeout "$to" docker wait "$cont" || true)
docker kill $cont &> /dev/null
echo -n 'status: '
if [ -z "$code" ]; then
echo timeout
else
echo exited: $code
fi
echo output:
# pipe to sed simply for pretty nice indentation
docker logs $cont | sed 's/^/\t/'
docker rm $cont &> /dev/null
Edit: The default timeout in my application (passed to the $to variable) is "10s" / 10 seconds.
I've tried looking into adding a timer and sys.exit() to the python source directly, but this isn't really a viable option as it seems rather insecure because the user could submit code to prevent it from executing, meaning the problem would still persist. Oh the joys of being stuck on a dissertation... :(
You could set up your container with a ulimit on the max CPU time, which will kill the looping process. A malicious user can get around this, though, if they're root inside the container.
There's another S.O. question, "Setting absolute limits on CPU for Docker containers" that describes how to limit the CPU consumption of containers. This would allow you to reduce the effect of malicious users.
I agree with Abdullah, though, that you ought to be able to docker kill the runaway from your supervisor.
If you want to run the containers without providing any protection inside them, you can use runtime constraints on resources.
In your case, -m 100M --cpu-quota 50000 might be reasonable.
That way it won't eat up the parent's system resources until you get around to killing it.
I have achieved a solution for this problem.
First you must kill docker container when time limit is achieved:
#!/bin/bash
set -e
did=$(docker run -it -d -v "/my_real_path/$1":/usercode virtual_machine ./usercode/compilerun.sh 2>> $1/error.txt)
sleep 10 && docker kill $did &> /dev/null && echo -n "timeout" >> $1/error.txt &
docker wait "$did" &> /dev/null
docker rm -f $ &> /dev/null
The container runs in detached mode (-d option), so it runs in the background.
Then you run sleep also in the background.
Then wait for the container to stop. If it doesnt stop in 10 seconds (sleep timer), the container will be killed.
As you can see, the docker run process calls a script named compilerun.sh:
#!/bin/bash
gcc -o /usercode/file /usercode/file.c 2> /usercode/error.txt && ./usercode/file < /usercode/input.txt | head -c 1M > /usercode/output.txt
maxsize=1048576
actualsize=$(wc -c <"/usercode/output.txt")
if [ $actualsize -ge $maxsize ]; then
echo -e "1MB file size limit exceeded\n\n$(cat /usercode/output.txt)" > /usercode/output.txt
fi
It starts by compiling and running a C program (its my use case, I am sure the same can be done for python compiller).
This part:
command | head -c 1M > /usercode/output.txt
Is responsible for the max output size thing. It allows output to be 1MB maximum.
After that, I just check if file is 1MB. If true, write a message inside (at the beginning of) the output file.
The --stop-timeout option is not killing the container if the timeout is exceeded.
Instead, use --ulimit --cpu=timeout to kill the container if the timeout is exceeded.
This is based on the CPU time for the process inside the container.
I guess, you can use signals in python like unix to set timeout. you can use alarm of specific time say 50 seconds and catch it. Following link might help you.
signals in python
Use --stop-timeout option while running your docker container. this will execute SIGKILL once the timeout occured
I'm using python fabric to deploy binaries to an ec2 server and am attempting to run them in background (a subshell).
All the fabric commands for performing local actions, putting files, and executing remote commands w/o elevated privileges work fine. The issue I run into is when I attempt to run the binary.
with cd("deploy"):
run('mkdir log')
sudo('iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 8080', user="root")
result = sudo('./dbserver &', user="root") # <---- This line
print result
if result.failed:
print "Running dbserver failed"
else:
print "DBServer now running server" # this gets printed despite the binary not running
After I login to the server and ps aux | grep dbserver nothing shows up. How can I get fabric to execute the binary? The same command ./dbserver & executed from the shell does exactly what I want it to. Thanks.
This is likey reated to TTY issues, and/or that you're attempting to background a process.
Both of these are discussed in the FAQ under these two headings:
http://www.fabfile.org/faq.html#init-scripts-don-t-work
http://www.fabfile.org/faq.html#why-can-t-i-run-programs-in-the-background-with-it-makes-fabric-hang
Try making the sudo like this:
sudo('nohup ./dbserver &', user=root, pty=False)
Context
I'm adding a few pieces to an existing, working system.
There is a control machine (a local Linux PC) running some test scripts which involve sending lots of commands to several different machines remotely via SSH. The test framework is written in Python and uses Fabric to access the different machines.
All commands are handled with a generic calling function, simplified below:
def cmd(host, cmd, args):
...
with fabric.api.settings(host_string=..., user='root', use_ssh_config=True, disable_known_hosts=True):
return fabric.api.run('%s %s' % (cmd, args))
The actual commands sent to each machine usually involve running an existing python script on the remote side. Those python scripts, do some jobs which include invoking external commands (using system and subprocess). The run() command called on the test PC will return when the remote python script is done.
At one point I needed one of those remote python scripts to launch a background task: starting an openvon server and client using openvpn --config /path/to/config.openvpn. In a normal python script I would just use &:
system('openvpn --config /path/to/config.openvpn > /var/log/openvpn.log 2>&1 &')
When this script is called remotely via Fabric, one must explicitly use nohup, dtach, screen and the likes to run the job in background. I got it working with:
system("nohup openvpn --config /path/to/config.openvpn > /var/log/openvpn.log 2>&1 < /dev/null &"
The Fabric FAQ goes into some details about this.
It works fine for certain background commands.
Problem: doesn't work for all types of background commands
This technique doesn't work for all the commands I need. In some scripts, I need to launch a background atop command (it's a top on steroids) and redirect its stdout to a file.
My code (note: using atop -P for parseable output):
system('nohup atop -P%s 1 < /dev/null | grep %s > %s 2>&1 &' % (dataset, grep_options, filename))
When the script containing that command is called remotely via Fabric, the atop process is immediately killed. The output file is generated but it's empty. Calling the same script while logged in the remote machine by SSH works fine, the atop command dumps data periodically in my output file.
Some googling and digging around brought me to interesting information about background jobs using Fabric, but my problem seems to be only specific to certains types of background jobs. I've tried:
appending sleep
running with pty=False
replacing nohup with dtach -n: same symptoms
I read about commands like top failing in Fabric with stdin redirected to /dev/null, not quite sure what to make of it. I played around with different combinations or (non-) redirects of STDIN, STDOUT and STDERR
Looks like I'm running out of ideas.
Fabric seems overkill for what we are doing. We don't even use the "fabfile" method because it's integrated in a nose framework and I run them invoking nosetests. Maybe I should resort to dropping Fabric in favor of manual SSH commands, although I don't like the idea of changing a working system because of it not supporting one of my newer modules.
In my environment, looks like it is working
from fabric.api import sudo
def atop():
sudo('nohup atop -Pcpu 1 </dev/null '
'| grep cpu > /tmp/log --line-buffered 2>&1 &',
pty=False)
result:
fabric:~$ fab atop -H web01
>>>[web01] Executing task 'atop'
>>>[web01] sudo: nohup atop -Pcpu 1 </dev/null | grep cpu > /tmp/log --line-buffered 2>&1 &
>>>
>>>Done.
web01:~$ cat /tmp/log
>>>cpu web01 1374246222 2013/07/20 00:03:42 361905 100 0 5486 6968 0 9344927 3146 0 302 555 0 2494 100
>>>cpu web01 1374246223 2013/07/20 00:03:43 1 100 0 1 0 0 99 0 0 0 0 0 2494 100
>>>cpu web01 1374246224 2013/07/20 00:03:44 1 100 0 1 0 0 99 0 0 0 0 0 2494 100
...
The atop command may need the super user. This doesn't work
from fabric.api import run
def atop():
run('nohup atop -Pcpu 1 </dev/null '
'| grep cpu > /tmp/log --line-buffered 2>&1 &',
pty=False)
On the other hand this work.
from fabric.api import run
def atop():
run('sudo nohup atop -Pcpu 1 </dev/null '
'| grep cpu > /tmp/log --line-buffered 2>&1 &',
pty=False)