I am running airflow standalone as a local development environment. I followed the instructions provided by Airflow to setup the environment, but now I'd like to shut it down in the most graceful way possible.
I ran the standalone command in a terminal, and so my first attempt was to simply use Ctrl+C. It looks promising:
triggerer | [2022-02-02 10:44:06,771] {triggerer_job.py:251} INFO - 0 triggers currently running
^Cstandalone | Shutting down components
However, even 10 minutes later, the shutdown is still in progress, with no more messages in the terminal. I used Ctrl+C again and got a KeyboardInterrupt. Did I do this the wrong way? Is there a better way to shut down the standalone environment?
You could try the following (in bash):
pkill --signal 2 -u $USER airflow
or
pkill --signal 15 -u $USER airflow
or
pkill --signal 9 -u $USER airflow
Say what?
Here's more description of each part:
pkill - Process kill function.
--signal - Tells what 'signal' to send to the process
2 | 15 | 9 - Is the id for the terminal signal to send.
2 = SIGINT, which is like CTRL + C.
15 = SIGTERM, the default for pkill.
9 = SIGKILL, which doesn't messaround with gracefully ending a process.
For more info, run kill -L in your bash terminal.
-u - Tells the functon to only match processes whose real user ID is listed.
$USER - The current session user environment variable. This may be different on your system, so adjust accordingly.
airflow - The name of the selection criteria or pattern to match.
prep info page for more detail on the options available.
Related
For my dissertation at University, I'm working on a coding leaderboard system where users can compile / run untrusted code through temporary docker containers. The system seems to be working well so far, but one problem I'm facing is that when code for an infinite loop is submitted, E.g:
while True:
print "infinite loop"
the system goes haywire. The problem is that when I'm creating a new docker container, the Python interpreter prevents docker from killing the child container as data is still being printed to STDOUT (forever). This leads to the huge vulnerability of docker eating up all available system resources until the machine using the system completely freezes (shown below):
So my question is, is there a better way of setting a timeout on a docker container than my current method that will actually kill the docker container and make my system secure (code originally taken from here)?
#!/bin/bash
set -e
to=$1
shift
cont=$(docker run --rm "$#")
code=$(timeout "$to" docker wait "$cont" || true)
docker kill $cont &> /dev/null
echo -n 'status: '
if [ -z "$code" ]; then
echo timeout
else
echo exited: $code
fi
echo output:
# pipe to sed simply for pretty nice indentation
docker logs $cont | sed 's/^/\t/'
docker rm $cont &> /dev/null
Edit: The default timeout in my application (passed to the $to variable) is "10s" / 10 seconds.
I've tried looking into adding a timer and sys.exit() to the python source directly, but this isn't really a viable option as it seems rather insecure because the user could submit code to prevent it from executing, meaning the problem would still persist. Oh the joys of being stuck on a dissertation... :(
You could set up your container with a ulimit on the max CPU time, which will kill the looping process. A malicious user can get around this, though, if they're root inside the container.
There's another S.O. question, "Setting absolute limits on CPU for Docker containers" that describes how to limit the CPU consumption of containers. This would allow you to reduce the effect of malicious users.
I agree with Abdullah, though, that you ought to be able to docker kill the runaway from your supervisor.
If you want to run the containers without providing any protection inside them, you can use runtime constraints on resources.
In your case, -m 100M --cpu-quota 50000 might be reasonable.
That way it won't eat up the parent's system resources until you get around to killing it.
I have achieved a solution for this problem.
First you must kill docker container when time limit is achieved:
#!/bin/bash
set -e
did=$(docker run -it -d -v "/my_real_path/$1":/usercode virtual_machine ./usercode/compilerun.sh 2>> $1/error.txt)
sleep 10 && docker kill $did &> /dev/null && echo -n "timeout" >> $1/error.txt &
docker wait "$did" &> /dev/null
docker rm -f $ &> /dev/null
The container runs in detached mode (-d option), so it runs in the background.
Then you run sleep also in the background.
Then wait for the container to stop. If it doesnt stop in 10 seconds (sleep timer), the container will be killed.
As you can see, the docker run process calls a script named compilerun.sh:
#!/bin/bash
gcc -o /usercode/file /usercode/file.c 2> /usercode/error.txt && ./usercode/file < /usercode/input.txt | head -c 1M > /usercode/output.txt
maxsize=1048576
actualsize=$(wc -c <"/usercode/output.txt")
if [ $actualsize -ge $maxsize ]; then
echo -e "1MB file size limit exceeded\n\n$(cat /usercode/output.txt)" > /usercode/output.txt
fi
It starts by compiling and running a C program (its my use case, I am sure the same can be done for python compiller).
This part:
command | head -c 1M > /usercode/output.txt
Is responsible for the max output size thing. It allows output to be 1MB maximum.
After that, I just check if file is 1MB. If true, write a message inside (at the beginning of) the output file.
The --stop-timeout option is not killing the container if the timeout is exceeded.
Instead, use --ulimit --cpu=timeout to kill the container if the timeout is exceeded.
This is based on the CPU time for the process inside the container.
I guess, you can use signals in python like unix to set timeout. you can use alarm of specific time say 50 seconds and catch it. Following link might help you.
signals in python
Use --stop-timeout option while running your docker container. this will execute SIGKILL once the timeout occured
I am trying to setup a crontab to run 6 python data scrapers. I am tired of having to restart them manually when one of them fails. When running the following:
> ps -ef | grep python
ubuntu 31537 1 0 13:09 ? 00:00:03 python /home/ubuntu/scrapers/datascraper1.py
etc... I get a list of the datascrapers 1-6 all in the same folder.
I edited my crontab like this:
sudo crontab -e
# m h dom mon dow command
* * * * * pgrep -f /home/ubuntu/scrapers/datascraper1.py || python /home/ubuntu/scrapers/datascraper1.py > test.out
Then I hit control+X to exit and hit yes to save as /tmp/crontab.M6sSxL/crontab .
However it does not work in restarting or even starting datascraper1.py whether I kill the process manually or if the process fails on its own. Next, I tried reloading cron but it still didn't work:
sudo cron reload
Finally I tried removing nohup from the cron statement and that also did not work.
How can I check if a cron.allow or cron.deny file exists?
Also, do I need to add a username before pgrep? I am also not sure what the "> test.out" is doing at the end of the cron statement.
After running
grep CRON /var/log/syslog
to check to see if cron ran at all, I get this output:
ubuntu#ip-172-31-29-12:~$ grep CRON /var/log/syslog
Jan 5 07:01:01 ip-172-31-29-12 CRON[31101]: (root) CMD (pgrep -f datascraper1.py ||
python /home/ubuntu/scrapers/datascraper1.py > test.out)
Jan 5 07:01:01 ip-172-31-29-12 CRON[31100]: (CRON) info (No MTA installed, discarding output)
Jan 5 07:17:01 ip-172-31-29-12 CRON[31115]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Jan 5 08:01:01 ip-172-31-29-12 CRON[31140]: (root) CMD (pgrep -f datascraper1.py || python /home/ubuntu/scrapers/datascraper1.py > test.out)
Since there is evidence of Cron executing the command, there must be something wrong with this command, (note: I added the path to python):
pgrep -f datascraper1.py || /usr/bin/python /home/ubuntu/scrapers/datascraper1.py > test.out
Which is supposed to check to see if datascaper1.py is running, if not then restart it.
Since Cron is literally executing this statement:
(root) CMD (pgrep -f datascraper1.py || python /home/ubuntu/scrapers/datascraper1.py > test.out)
aka
root pgrep -f datascraper1.py
Running the above root command gives me:
The program 'root' is currently not installed. You can install it by typing:
sudo apt-get install root-system-bin
Is there a problem with Cron running commands from root?
Thanks for your help.
First of all, you need to see if cron is working at all.
Add this to your cron file (and ideally delete the python statement for now, to have a clear state)
* * * * * echo `date` >>/home/your_username/hello_cron
This will output the date in the file "hello_cron" every minute. Try this, and if this works, ie you see output every minute, write here and we can troubleshoot further.
You can also look in your system logs to see if cron has ran your command, like so:
grep CRON /var/log/syslog
Btw the >test.out part would redirect the output of the python program to the file test.out. I am not sure why you need the nohup part - this would let the python programs run even if you are logged out - is this what you want?
EDIT: After troubleshooting cron:
The message about no MTA installed means that cron is trying to send you an e-mail with the output of the job but cannot because you dont have an email programm installed:
Maybe this will fix it:
sudo apt-get install postfix
The line invoking the python program in cron is producing some output (an error) so it's in your best interests to see what happens. Look at this tutorial to see how to set your email address: http://www.cyberciti.biz/faq/linux-unix-crontab-change-mailto-settings/
Just in case the tutorial becomes unavailable:
MAILTO:youremail#example.com
You need to add python home to your path at the start of the job, however you have python set up. When you're running it yourself and you type python, it checks where you are, then one level down, then your $PATH. So, python home (where the python binary is) needs to be globally exported for the user that owns the cron (so, put it in a rc script in /etc/rc.d/) or, you need to append the python home to path at the start of the cron job. So,
export PATH=$PATH:<path to python>
Or, write the cron entry as
/usr/bin/python /home/ubuntu/etc/etc
to call it directly. It might not be /usr/bin, run the command
'which python'
to find out.
The 'No MTA' message means you are getting STDERR, which would normally get mailed to the user, but can't because you have no Mail Transfer Agent set up, like mailx, or mutt, so no mail for the user can get delivered from cron, so it is discarded. If you'd like STDERR to go into the log also, at the end, instead of
"command" > test.out
write
"command" 2>&1 > test.out
to redirect STDERR into STDOUT, then redirect both to test.out.
I added a bottle server that uses python's cassandra library, but it exits with this error: Bottle FATAL Exited too quickly (process log may have details) log shows this: File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 1765, in _reconnect_internal
raise NoHostAvailable("Unable to connect to any servers", errors)So I tried to run it manually using supervisorctl start Bottle ,and then it started with no issue. The conclusion= Bottle service starts too fast (before the needed cassandra supervised service does): a delay is needed!
This is what I use:
[program:uwsgi]
command=bash -c 'sleep 5 && uwsgi /etc/uwsgi.ini'
Not happy enough with the sleep hack I created a startup script and launched supervisorctl start processname from there.
[program:startup]
command=/startup.sh
startsecs = 0
autostart = true
autorestart = false
startretries = 1
priority=1
[program:myapp]
command=/home/website/venv/bin/gunicorn /home/website/myapp/app.py
autostart=false
autorestart=true
process_name=myapp
startup.sh
#!/bin/bash
sleep 5
supervisorctrl start myapp
This way supervisor will fire the startup script once and this will start myapp after 5 seconds, mind the autostart=false and autorestart=true on myapp.
I had a similar issue where, starting 64 python rq-worker processes using supervisorctl was raising CPU and RAM alert at every restart. What I did was the following:
command=/bin/bash -c "sleep %(process_num)02d && virtualenv/bin/python3 manage.py rqworker --name %(program_name)s_my-rq-worker_%(process_num)02d default low"
Basically, before running the python command, I sleep for N second, where N is the process number, which basically means I supervisor will start one rq-worker process every second.
I'm using python fabric to deploy binaries to an ec2 server and am attempting to run them in background (a subshell).
All the fabric commands for performing local actions, putting files, and executing remote commands w/o elevated privileges work fine. The issue I run into is when I attempt to run the binary.
with cd("deploy"):
run('mkdir log')
sudo('iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 8080', user="root")
result = sudo('./dbserver &', user="root") # <---- This line
print result
if result.failed:
print "Running dbserver failed"
else:
print "DBServer now running server" # this gets printed despite the binary not running
After I login to the server and ps aux | grep dbserver nothing shows up. How can I get fabric to execute the binary? The same command ./dbserver & executed from the shell does exactly what I want it to. Thanks.
This is likey reated to TTY issues, and/or that you're attempting to background a process.
Both of these are discussed in the FAQ under these two headings:
http://www.fabfile.org/faq.html#init-scripts-don-t-work
http://www.fabfile.org/faq.html#why-can-t-i-run-programs-in-the-background-with-it-makes-fabric-hang
Try making the sudo like this:
sudo('nohup ./dbserver &', user=root, pty=False)
Context
I'm adding a few pieces to an existing, working system.
There is a control machine (a local Linux PC) running some test scripts which involve sending lots of commands to several different machines remotely via SSH. The test framework is written in Python and uses Fabric to access the different machines.
All commands are handled with a generic calling function, simplified below:
def cmd(host, cmd, args):
...
with fabric.api.settings(host_string=..., user='root', use_ssh_config=True, disable_known_hosts=True):
return fabric.api.run('%s %s' % (cmd, args))
The actual commands sent to each machine usually involve running an existing python script on the remote side. Those python scripts, do some jobs which include invoking external commands (using system and subprocess). The run() command called on the test PC will return when the remote python script is done.
At one point I needed one of those remote python scripts to launch a background task: starting an openvon server and client using openvpn --config /path/to/config.openvpn. In a normal python script I would just use &:
system('openvpn --config /path/to/config.openvpn > /var/log/openvpn.log 2>&1 &')
When this script is called remotely via Fabric, one must explicitly use nohup, dtach, screen and the likes to run the job in background. I got it working with:
system("nohup openvpn --config /path/to/config.openvpn > /var/log/openvpn.log 2>&1 < /dev/null &"
The Fabric FAQ goes into some details about this.
It works fine for certain background commands.
Problem: doesn't work for all types of background commands
This technique doesn't work for all the commands I need. In some scripts, I need to launch a background atop command (it's a top on steroids) and redirect its stdout to a file.
My code (note: using atop -P for parseable output):
system('nohup atop -P%s 1 < /dev/null | grep %s > %s 2>&1 &' % (dataset, grep_options, filename))
When the script containing that command is called remotely via Fabric, the atop process is immediately killed. The output file is generated but it's empty. Calling the same script while logged in the remote machine by SSH works fine, the atop command dumps data periodically in my output file.
Some googling and digging around brought me to interesting information about background jobs using Fabric, but my problem seems to be only specific to certains types of background jobs. I've tried:
appending sleep
running with pty=False
replacing nohup with dtach -n: same symptoms
I read about commands like top failing in Fabric with stdin redirected to /dev/null, not quite sure what to make of it. I played around with different combinations or (non-) redirects of STDIN, STDOUT and STDERR
Looks like I'm running out of ideas.
Fabric seems overkill for what we are doing. We don't even use the "fabfile" method because it's integrated in a nose framework and I run them invoking nosetests. Maybe I should resort to dropping Fabric in favor of manual SSH commands, although I don't like the idea of changing a working system because of it not supporting one of my newer modules.
In my environment, looks like it is working
from fabric.api import sudo
def atop():
sudo('nohup atop -Pcpu 1 </dev/null '
'| grep cpu > /tmp/log --line-buffered 2>&1 &',
pty=False)
result:
fabric:~$ fab atop -H web01
>>>[web01] Executing task 'atop'
>>>[web01] sudo: nohup atop -Pcpu 1 </dev/null | grep cpu > /tmp/log --line-buffered 2>&1 &
>>>
>>>Done.
web01:~$ cat /tmp/log
>>>cpu web01 1374246222 2013/07/20 00:03:42 361905 100 0 5486 6968 0 9344927 3146 0 302 555 0 2494 100
>>>cpu web01 1374246223 2013/07/20 00:03:43 1 100 0 1 0 0 99 0 0 0 0 0 2494 100
>>>cpu web01 1374246224 2013/07/20 00:03:44 1 100 0 1 0 0 99 0 0 0 0 0 2494 100
...
The atop command may need the super user. This doesn't work
from fabric.api import run
def atop():
run('nohup atop -Pcpu 1 </dev/null '
'| grep cpu > /tmp/log --line-buffered 2>&1 &',
pty=False)
On the other hand this work.
from fabric.api import run
def atop():
run('sudo nohup atop -Pcpu 1 </dev/null '
'| grep cpu > /tmp/log --line-buffered 2>&1 &',
pty=False)