Stopping Python container is slow - SIGTERM not passed to python process? - python

I made a simple python webserver based on this example, which runs inside Docker
FROM python:3-alpine
WORKDIR /app
COPY entrypoint.sh .
RUN chmod +x entrypoint.sh
COPY src src
CMD ["python", "/app/src/api.py"]
ENTRYPOINT ["/app/entrypoint.sh"]
Entrypoint:
#!/bin/sh
echo starting entrypoint
set -x
exec "$#"
Stopping the container took very long, altough the exec statement with the JSON array syntax should pass it to the python process. I assumed a problem with SIGTERM no being passed to the container. I added the following to my api.pyscript to detect SIGTERM
def terminate(signal,frame):
print("TERMINATING")
if __name__ == "__main__":
signal.signal(signal.SIGTERM, terminate)
webServer = HTTPServer((hostName, serverPort), MyServer)
print("Server started http://%s:%s" % (hostName, serverPort))
webServer.serve_forever()
Executed without Docker python3 api/src/api.py, I tried
kill -15 $(ps -guaxf | grep python | grep -v grep | awk '{print $2}')
to send SIGTERM (15 is the number code of it). The script prints TERMINATING, so my event handler works. Now I run the Docker container using docker-compose and press CTRL + C. Docker says gracefully stopping... (press Ctrl+C again to force) but doesn't print my terminating message from the event handler.
I also tried to run docker-compose in detached mode, then run docker-compose kill -s SIGTERM api and view the logs. Still no message from the event handler.

Since the script runs as pid 1 as desired and setting init: true in docker-compose.yml doesn't seem to change anything, I took a deeper drive in this topic. This leads me figuring out multiple mistakes I did:
Logging
The approach of printing a message when SIGTERM is catched was designed as simple test case to see if this basically works before I care about stopping the server. But I noticed that no message appears for two reasons:
Output buffering
When running a long term process in python like the HTTP server (or any while True loop for example), there is no output displayed when starting the container attached with docker-compose up (no -d flag). To receive live logs, we need to start python with the -u flag or set the env variable PYTHONUNBUFFERED=TRUE.
No log piping after stop
But the main problem was not the output buffering (this is just a notice since I wonder why there was no log output from the container). When canceling the container, docker-compose stops piping logs to the console. This means that from a logical perspective it can't display anything that happens AFTER CTRL + C is pressed.
To fetch those logs, we need to wait until docker-compose has stopped the container and run docker-compose logs. It will print all, including those generated after CTRL + C is pressed. Using docker-compose logs I found out that SIGTERM is passed to the container and my event handler works.
Stopping the webserver
With those knowledge I tried to stop the webserver instance. First this doesn't work because it's not enough to just call webServer.server_close(). Its required to exit explicitely after any cleanup work is done like this:
def terminate(signal,frame):
print("Start Terminating: %s" % datetime.now())
webServer.server_close()
sys.exit(0)
When sys.exit() is not called, the process keeps running which results in ~10s waiting time before Docker kills it.
Full working example
Here a demo script that implement everything I've learned:
from http.server import BaseHTTPRequestHandler, HTTPServer
import signal
from datetime import datetime
import sys, os
hostName = "0.0.0.0"
serverPort = 80
class MyServer(BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(200)
self.send_header("Content-Type", "text/html")
self.end_headers()
self.wfile.write(bytes("Hello from Python Webserver", "utf-8"))
webServer = None
def terminate(signal,frame):
print("Start Terminating: %s" % datetime.now())
webServer.server_close()
sys.exit(0)
if __name__ == "__main__":
signal.signal(signal.SIGTERM, terminate)
webServer = HTTPServer((hostName, serverPort), MyServer)
print("Server started http://%s:%s with pid %i" % ("0.0.0.0", 80, os.getpid()))
webServer.serve_forever()
Running in a container, it could be stopped very fast without waiting for Docker to kill the process:
$ docker-compose up --build -d
$ time docker-compose down
Stopping python-test_app_1 ... done
Removing python-test_app_1 ... done
Removing network python-test_default
real 0m1,063s
user 0m0,424s
sys 0m0,077s

Docker runs your application, per default, in foreground, so, as PID 1, this said, the process with the PID 1 as a special meaning and specific protections in Linux.
This is highlighted in docker run documentation:
Note
A process running as PID 1 inside a container is treated specially by Linux: it ignores any signal with the default action. As a result, the process will not terminate on SIGINT or SIGTERM unless it is coded to do so.
Source: https://docs.docker.com/engine/reference/run/#foreground
In order to fix this, you can run the container, in a single container mode, with the flag --init of docker run:
You can use the --init flag to indicate that an init process should be used as the PID 1 in the container. Specifying an init process ensures the usual responsibilities of an init system, such as reaping zombie processes, are performed inside the created container.
Source: https://docs.docker.com/engine/reference/run/#specify-an-init-process
The same configuration is possible in docker-compose, simply by specifying init: true on the container.
An example would be:
version: "3.8"
services:
web:
image: alpine:latest
init: true
Source: https://docs.docker.com/compose/compose-file/#init

Related

How to gracefully stop a Dockerized Python ROS2 node when run with docker-compose up?

I have a Python-based ROS2 node running inside a Docker container and I am trying to handle the graceful shutdown of the node by capturing the SIGTERM/SIGINT signals and/or by catching the KeyboardInterrupt exception.
The problem is when I run the node in a container using docker-compose. I cannot seem to catch the "moment" when the container is being stopped/killed. I've explicitly added the STOPSIGNAL in the Dockerfile and the stop_signal in the docker-compose file.
Here is a sample of the node code:
import signal
import sys
import rclpy
def stop_node(*args):
print("Stopping node..")
rclpy.shutdown()
return True
def main():
rclpy.init(args=sys.argv)
print("Creating node..")
node = rclpy.create_node("mynode")
print("Running node..")
while rclpy.ok():
rclpy.spin_once(node)
if __name__ == '__main__':
try:
signal.signal(signal.SIGINT, stop_node)
signal.signal(signal.SIGTERM, stop_node)
main()
except:
stop_node()
Here is a sample Dockerfile to re-create the image:
FROM osrf/ros2:nightly
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C1CF6E31E6BADE8868B172B4F42ED6FBAB17C654
RUN apt-get update && \
apt-get install -y vim
WORKDIR /nodes
COPY mynode.py .
ADD run-node.sh /run-node.sh
RUN chmod +x /run-node.sh
STOPSIGNAL SIGTERM
Here is the sample docker-compose.yml:
version: '3'
services:
mynode:
container_name: mynode-container
image: mynode
entrypoint: /bin/bash -c "/run-node.sh"
privileged: true
stdin_open: false
tty: true
stop_signal: SIGTERM
Here is the run-node.sh script:
source /opt/ros/$ROS_DISTRO/setup.bash
python3 /nodes/mynode.py
When I manually run the node inside the container (using python3 mynode.py or by /run-node.sh) or when I do docker run -it mynode /bin/bash -c "/run-node.sh", I get the "Stopping node.." message. But when I do docker-compose up, I never see that message when I stop the container, by Ctrl+C or by docker-compose down.
$ docker-compose up
Creating network "ros-node_default" with the default driver
Creating mynode-container ... done
Attaching to mynode-container
mynode-container | Creating node..
mynode-container | Running node..
^CGracefully stopping... (press Ctrl+C again to force)
Stopping mynode-container ... done
$
I've tried:
moving the calls to signal.signal
using atexit instead of signal
using docker stop and docker kill --signal
I've also checked this Python inside docker container, gracefully stop question but there's no clear solution there, and I'm not sure if using ROS/rclpy makes my setup different (also, my host machine is Ubuntu 18.04, while that user was on Windows).
Is it possible to catch the stopping of the container in my stop_node method?
When your docker-compose.yml file says:
entrypoint: /bin/bash -c "/run-node.sh"
Since that's a bare string, Docker wraps it in a /bin/sh -c wrapper. So your container's main process is something like
/bin/sh -c '/bin/bash -c "/run-node.sh"'
In turn, the bash script stays running. It launches a Python script, and stays running as its parent until that script exits. (The two levels of sh -c wrappers may or may not stay running.)
The important part here is that this wrapper shell, not your script, is the main container process that receives signals, and (it turns out) won't receive SIGTERM unless it's explicitly coded to.
The most important restructuring to do here is to have your wrapper script exec the Python script. That causes it to replace the wrapper, so it becomes the main process and receives signals. If nothing else changing the last line to
exec python3 /nodes/mynode.py
will likely help.
I would go a little further here and make sure as much of this code is built into your Docker image, and try to minimize the number of explicit shell wrappers. "Do some initialization, then exec something" is an extremely common Docker pattern, and you can write this script and make it your image's entrypoint:
#!/bin/sh
# Do the setup
# ("." is the same as "source", but standard)
. "/opt/ros/$ROS_DISTRO/setup.bash"
# Run the main CMD
exec "$#"
Similarly, your main script should start with a "shebang" line like
#!/usr/bin/env python3
import ...
Your Dockerfile already contains the setup to be able to run the wrapper directly, you may need a similar RUN chmod line for the main script. But then you can add
ENTRYPOINT ["/run-node.sh"]
CMD ["/nodes/my-node.py"]
Since both scripts are executable and have the "shebang" lines you can run them directly. Using the JSON syntax keeps Docker from adding an additional shell wrapper. Since your entrypoint script will now run whatever the command is, it's easy to change that separately. For example, if you want an interactive shell that's done the environment variable setup to try to debug your container startup, you can override just the command part
docker run --rm -it mynode sh

Python fabric unable to start process

I'm using python fabric to deploy binaries to an ec2 server and am attempting to run them in background (a subshell).
All the fabric commands for performing local actions, putting files, and executing remote commands w/o elevated privileges work fine. The issue I run into is when I attempt to run the binary.
with cd("deploy"):
run('mkdir log')
sudo('iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 8080', user="root")
result = sudo('./dbserver &', user="root") # <---- This line
print result
if result.failed:
print "Running dbserver failed"
else:
print "DBServer now running server" # this gets printed despite the binary not running
After I login to the server and ps aux | grep dbserver nothing shows up. How can I get fabric to execute the binary? The same command ./dbserver & executed from the shell does exactly what I want it to. Thanks.
This is likey reated to TTY issues, and/or that you're attempting to background a process.
Both of these are discussed in the FAQ under these two headings:
http://www.fabfile.org/faq.html#init-scripts-don-t-work
http://www.fabfile.org/faq.html#why-can-t-i-run-programs-in-the-background-with-it-makes-fabric-hang
Try making the sudo like this:
sudo('nohup ./dbserver &', user=root, pty=False)

Run a background process and then kill it after some intermediate processing

Specifically, I'm trying to use fabric to run some tests which rely on the existence of a MongoDB.
I have the following code:
db_cmd = 'mongod'
test_cmd = 'istanbul cover node_modules/mocha/bin/_mocha -- -R spec'
pid = os.spawnl(os.P_NOWAIT, db_cmd)
with shell_env(NODE_ENV='test'):
local(test_cmd)
I plan to use the PID to kill the process after the test_cmd has finished however I've not gotten that far yet.
The running of test_cmd results in an error suggesting that db_cmd has exited and that MongoDB is no longer available:
Uncaught Error: failed to connect to [localhost:27017]
However running mongod manually before running fabric causes test_cmd to run fine and interact with MongoDB.
I suspect I'm just not understanding os.spawnl. Note that this code needs to run on Windows / Linux and OSX so I think I'm somewhat restricted in which os.spawnxxx methods I can use. I'm also interested to know if there's a fabric method to achieve this as well though.
I successfully use:
os.killpg(process.pid, signal.SIGTERM)
Probably, you need to use subprocess module for that.
To run mongo in background use:
process = subprocess.Popen(
command, stdout=subprocess.PIPE, stderr=subprocess.PIPE,
shell=True, preexec_fn=os.setsid
)
To kill it after tests, use command I 've written first.
command - is a string contained your mongo start code, for example :
mongod --host localhost --port 27018
It works fine for me. If you will have problems wit the code, please let me know it.
You can also do this in straight bash with jobs and traps:
#!/bin/bash
trap "kill %1" SIGINT SIGTERM EXIT
mongod --host localhost --port 27018 &
istanbul cover node_modules/mocha/bin/_mocha -- -R spec
exit 0
What this is doing:
Set a trap on signals, SIGINT SIGTERM EXIT, to kill the first backgrond job
Make a mongod instance, and throw it into the background (the first one)
Run the tests
trigger exit signal
So this will setup and tare down your mongod instance on completion, even on a term signal or exception.

Exit a Python process not kill it (via ssh)

I am starting my script locally via:
sudo python run.py remote
This script happens to also open a subprocess (if that matters)
webcam = subprocess.Popen('avconv -f video4linux2 -s 320x240 -r 20 -i /dev/video0 -an -metadata title="OfficeBot" -f flv rtmp://6f7528a4.fme.bambuser.com/b-fme/xxx', shell = True)
I want to know how to terminate this script when I SSH in.
I understand I can do:
sudo pkill -f "python run.py remote"
or use:
ps -f -C python
to find the process ID and kill it that way.
However none of these gracefully kill the process, I want to able to trigger the equilivent of CTRL/CMD C to register an exit command (I do lots of things on shutdown that aren't triggered when the process is simply killed).
Thank you!
You should use "signals" for it:
http://docs.python.org/2/library/signal.html
Example:
import signal, os
def handler(signum, frame):
print 'Signal handler called with signal', signum
signal.signal(signal.SIGINT, handler)
#do your stuff
then in terminal:
kill -INT $PID
or ctrl+c if your script is active in current shell
http://en.wikipedia.org/wiki/Unix_signal
also this might be useful:
How do you create a daemon in Python?
You can use signals for communicating with your process. If you want to emulate CTRL-C the signal is SIGINT (which you can raise by kill -INT and process id. You can also modify the behavior for SIGTERM which would make your program shut down cleanly under a broader range of circumstances.

How to make Python script run as service?

I want to run a python script in a CENTOS server:
#!/usr/bin/env python
import socket
try:
import thread
except ImportError:
import _thread as thread #Py3K changed it.
class Polserv(object):
def __init__(self):
self.numthreads = 0
self.tidcount = 0
self.port = 843
self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
self.sock.bind(('100.100.100.100', self.port))
self.sock.listen(5)
def run(self):
while True:
thread.start_new_thread(self.handle, self.sock.accept())
def handle(self,conn,addr):
self.numthreads += 1
self.tidcount += 1
tid=self.tidcount
while True:
data=conn.recv(2048)
if not data:
conn.close()
self.numthreads-=1
break
#if "<policy-file-request/>\0" in data:
conn.sendall(b"<?xml version='1.0'?><cross-domain-policy><allow-access-from domain='*' to-ports='*'/></cross-domain-policy>")
conn.close()
self.numthreads-=1
break
#conn.sendall(b"[#%d (%d running)] %s" % (tid,self.numthreads,data) )
Polserv().run()
Im using $ python flashpolicyd.py and it works fine...
The question is: How to keep this script running even after I close the terminal(console)?
I offer two recommendations:
supervisord
1) Install the supervisor package (more verbose instructions here):
sudo apt-get install supervisor
2) Create a config file for your daemon at /etc/supervisor/conf.d/flashpolicyd.conf:
[program:flashpolicyd]
directory=/path/to/project/root
environment=ENV_VARIABLE=example,OTHER_ENV_VARIABLE=example2
command=python flashpolicyd.py
autostart=true
autorestart=true
3) Restart supervisor to load your new .conf
supervisorctl update
supervisorctl restart flashpolicyd
systemd (if currently used by your Linux distro)
[Unit]
Description=My Python daemon
[Service]
Type=simple
ExecStart=/usr/bin/python3 /opt/project/main.py
WorkingDirectory=/opt/project/
Environment=API_KEY=123456789
Environment=API_PASS=password
Restart=always
RestartSec=2
[Install]
WantedBy=sysinit.target
Place this file into /etc/systemd/system/my_daemon.service and enable it using systemctl daemon-reload && systemctl enable my_daemon && systemctl start my_daemon --no-block.
To view logs:
systemctl status my_daemon
I use this code to daemonize my applications. It allows you start/stop/restart the script using the following commands.
python myscript.py start
python myscript.py stop
python myscript.py restart
In addition to this I also have an init.d script for controlling my service. This allows you to automatically start the service when your operating system boots-up.
Here is a simple example to get your going. Simply move your code inside a class, and call it from the run function inside MyDeamon.
import sys
import time
from daemon import Daemon
class YourCode(object):
def run(self):
while True:
time.sleep(1)
class MyDaemon(Daemon):
def run(self):
# Or simply merge your code with MyDaemon.
your_code = YourCode()
your_code.run()
if __name__ == "__main__":
daemon = MyDaemon('/tmp/daemon-example.pid')
if len(sys.argv) == 2:
if 'start' == sys.argv[1]:
daemon.start()
elif 'stop' == sys.argv[1]:
daemon.stop()
elif 'restart' == sys.argv[1]:
daemon.restart()
else:
print "Unknown command"
sys.exit(2)
sys.exit(0)
else:
print "usage: %s start|stop|restart" % sys.argv[0]
sys.exit(2)
Upstart
If you are running an operating system that is using Upstart (e.g. CentOS 6) - you can also use Upstart to manage the service. If you use Upstart you can keep your script as is, and simply add something like this under /etc/init/my-service.conf
start on started sshd
stop on runlevel [!2345]
exec /usr/bin/python /opt/my_service.py
respawn
You can then use start/stop/restart to manage your service.
e.g.
start my-service
stop my-service
restart my-service
A more detailed example of working with upstart is available here.
Systemd
If you are running an operating system that uses Systemd (e.g. CentOS 7) you can take a look at the following Stackoverflow answer.
My non pythonic approach would be using & suffix. That is:
python flashpolicyd.py &
To stop the script
killall flashpolicyd.py
also piping & suffix with disown would put the process under superparent (upper):
python flashpolicyd.pi & disown
first import os module in your app than with use from getpid function get pid's app and save in a file.for example :
import os
pid = os.getpid()
op = open("/var/us.pid","w")
op.write("%s" % pid)
op.close()
and create a bash file in /etc/init.d path:
/etc/init.d/servername
PATHAPP="/etc/bin/userscript.py &"
PIDAPP="/var/us.pid"
case $1 in
start)
echo "starting"
$(python $PATHAPP)
;;
stop)
echo "stoping"
PID=$(cat $PIDAPP)
kill $PID
;;
esac
now , u can start and stop ur app with down command:
service servername stop
service servername start
or
/etc/init.d/servername stop
/etc/init.d/servername start
for my script of python, I use...
To START python script :
start-stop-daemon --start --background --pidfile $PIDFILE --make-pidfile --exec $DAEMON
To STOP python script :
PID=$(cat $PIDFILE)
kill -9 $PID
rm -f $PIDFILE
P.S.: sorry for poor English, I'm from CHILE :D

Categories