I have the docker container that executes the python file - I want it to restart on the failure of the script - usually memory errors.
Docker is running, python file works but after script failure container just exit.
The container just --restart always policy does not work - what am I doing wrong?
docker command:
sudo docker run --init --gpus all \
--ipc host --privileged --net host \
-p 8888:8888 -p49053:49053
--restart always \
-v /mnt/disks/sde:/home/sliceruser/data \
-v /mnt/disks/sdb:/home/sliceruser/dataOld \
slicerpicai:latest
end of docker file
ENTRYPOINT [ "/bin/bash", "start.sh","-l", "-c" ]
start.sh
cd /home/sliceruser/data/piCaiCode
git pull
python3.8 /home/sliceruser/data/piCaiCode/Three_chan_baseline_hyperParam.py
He restart policy works, you might just not see it. I suggest to check the number of retries so far on the container:
docker inspect -f "{{ .RestartCount }}" my-container
Also docker tries indefinitely (with --retry always) but it does wait always longer if the start keeps failing.
If you say your script has memory issues, it would be good to address those before looking at issues with Docker. if the reason why the container stops lies outside docker, that obviously stops the container from restarting as well. So I recommend checking the container logs and thinking of what you do in order to manually restart the container after a failure.
For more details check also the official reference of docker run
If you want to reproduce what I wrote above do the following:
Open 1 terminal and run:
docker stats
Open a second terminal and run:
docker run -d --name testcontainer --restart always alpine:latest sh -c "sleep 5 && exit 2"
This will start a container that "crashes" every 5s.
In the same terminal run:
# check the status and see how it waits longer and longer to restart
docker container ls --filter name="testcontainer"
# check the number of restarts so far
docker inspect -f "{{ .RestartCount }}" testcontainer
Friendly footnote: I think you are lucky it doesn't restart because this is such an unsecure container. ;)
Related
My Objective: I want to be able to restart a container based on the official Python Image using some command inside the container.
My system: I have a own Docker image based on the official python image which look like this:
FROM python:3.6.15-buster
WORKDIR /webserver
COPY requirements.txt /webserver
RUN /usr/local/bin/python -m pip install --upgrade pip
RUN pip3 install -r requirements.txt --no-binary :all:
COPY . /webserver
ENTRYPOINT ["./start.sh"]
As you can see, the image does not execute a single python file but it executes a script called start.sh, which looks like this:
#!/bin/bash
echo "Starting"
echo "Env: $ENTORNO"
exec python3 "$PATH_ENTORNO""Script1.py" &
exec python3 "$PATH_ENTORNO""Script2.py" &
exec python3 "$PATH_ENTORNO""Script3.py" &
All of this works perfectly, but, I want that if, for example, script 3 fails, the entire container based on this image get restarted.
My approach: I had two ideas about this problem. First, try to execute a reboot command in the python3 script, something like this:
from subprocess import call
[...]
call(["reboot"])
This does not work inside the Python Debian image, because of error:
reboot: command not found
The other approach was to mount the docker.sock inside the container, but the error this time is:
root#MachineName:/var/run# /var/run/docker.sock docker ps
bash: /var/run/docker.sock: Permission denied
I dont know if I'm doing right these two approach, or if anyone has any idea about this but any help will be very appreciated.
Update
After thinking about it, I realised you could send some signal to the PID 1 (your entrypoint), trap it and use a handler to exit with an appropriate code so that docker will reschedule it.
Here's an MRE:
Dockerfile
FROM python:3.9
WORKDIR /app
COPY ./ /app
ENTRYPOINT ["./start.sh"]
start.sh
#!/usr/bin/env bash
python script.py &
# This traps user defined signal and kills the last command
# (`tail -f /dev/null`) before exiting with code 1.
trap 'kill ${!}; echo "Killed by backgrounded process"; exit 1' USR1
# Launches `tail` in the background and sets this program to wait
# for it to finish, so that it does not block execution
tail -f /dev/null & wait $!
script.py
import os
import signal
# Process 1 will be your entrypoint if you declared it in `exec-form`*
print("Sending signal to stop container")
os.kill(1, signal.SIGUSR1)
*exec form
Testing it
> docker build . -t test
> docker run test
Sending signal to stop container
Killed by backgrounded process
> docker inspect $(docker container ls -n 1 -q) --format='{{.State.ExitCode}}'
1
Original post
I think the safest bet would be to instruct docker to restart your container when there's some failure. Then you'd only have to exit your program with a non-zero code (i.e: run exit 1 from your start.sh) and docker will restart it from scratch.
Option 1: docker run --restart
Related documentation
docker run --restart on-failure <image>
Option 2: Using docker-compose
Version 3
In your docker-compose.yml you can set the restart_policy directive to the service you're interested on restarting. i.e:
version: "3"
services:
app:
...
restart_policy:
condition: on-failure
...
Version 2
Before version 3, the same policy could be applied with the restart directive, which allows for less configuration.
version: "2"
services:
app:
...
restart: "on-failure"
...
Is there any reason why you are running 3 processes in the same container? As per the microservice architecture basics, only one process should run in a container. So you should run 3 dockers for the 3 scripts. All 3 scripts should have the logic that if one of the 3 dockers is not reachable, then it should get killed.
Well, in the end the solution was much simpler than I expected.
I started from the base where I mount the docker socket inside the container (I know that this practice is not recommended, but in my case, I know that it does not pose security problems), using the command in docker-compose:
volumes:
- /var/run/docker.sock:/var/run/docker.sock
Then, it was as simple as using the Docker library for python, which gives a complete SDK through that socket that allowed me to restart the container inside the python script in an ultra-simple way.
import docker
[...]
docker_client = docker.DockerClient(base_url='unix://var/run/docker.sock')
docker_client.containers.get("container_name").restart()
I have a dockerfile
FROM python:3
WORKDIR /app
ADD ./venv ./venv
ADD ./data/file1.csv.gz ./data/file1.csv.gz
ADD ./data/file2.csv.gz ./data/file2.csv.gz
ADD ./requirements.txt ./venv/requirements.txt
WORKDIR /app/venv
RUN pip install --no-cache-dir -r requirements.txt
CMD ["python", "./src/script.py", "/app/data/file1.csv.gz", "/app/data/file2.csv.gz"]
After building an image from it and running it, the image runs the app as it should, but then the container shuts down immediately after finishing. This is definitely problematic since I can't expect the output file.
I have tried using docker run -d -t <imgname> and docker ps shows the app for a few seconds, but once again, as soon as it finishes the process, the container shuts itself down.
So it's impossible to access, even with docker exec <imgid> -it --entrypoint /bin/bash, it just immediately exits.
I've also tried adding a last RUN /bin/bash after the last CMD but it doesn't help either.
What can I do actually be able to log into the container and inspect the file?
As long as the container hasen't been removed, you will be able to get at the data. You can find the name of the container using docker ps -a.
Then, if you know the location of the file, you can copy it to your host using
docker cp <container name>:<file> .
Alternatively, you can commit the contents of the container to a new image and run a shell in that using
docker commit <container name> newimagename
docker run --rm -it newimagename /bin/bash
Then you can look around in the container and find your files.
Unfortunately there's no way to start the container up again and look around in it. docker start will start the container, but will run the same command again as was run when you did docker run.
I'm trying to Dockerize a web service using Tangelo and python.
My project structure is as follows:
test.py
requirements.txt
Dockerfile
test.py
import ...
def run(query):
...
return response
requirements.txt
... # other packages, numpy, open-cv, etc
tangelo
Dockerfile
FROM ubuntu:latest
RUN apt-get update
RUN apt-get install -y python python-pip git
EXPOSE 9220
ADD . /test
WORKDIR /test
RUN pip install -r requirements.txt
CMD "tangelo --port 9220"
I build this using
docker build -t "test" .
And run in detached mode using
docker run -p 9220:9220 -d "test"
But docker ps shows me that the docker stops almost as soon as it has started. I don't know what the problem is since I cannot inspect the logs.
I have tried a lot of things but I still can't figure this thing out.
Any ideas? If needed, I can provide more info.
EDIT:
When I build, step 8 says
Step 8/8 : ENTRYPOINT tangelo --port 9220
---> Running in 8b54841853ab
Removing intermediate container 8b54841853ab
So it means these are run in an intermediate container. Why is that and how can I prevent it?
TL;DR: Use:
CMD tangelo -np --port 9220
Instead of:
CMD "tangelo --port 9220"
Explanation:
You have two ways to debug the problem:
Inspect the logs of the container:
$ docker run -d test
28684015e519c0c8d644fccf98240d1465acabab6d16c19fd59c5f465b7f18af
$ sudo docker logs 28684015e519c
/bin/sh: 1: tangelo --port 9220: not found
Instead of running in detached mode, run in foreground with -i/--interactive (and optionally also -t/--tty):
$ docker run -ti test
/bin/sh: 1: tangelo --port 9220: not found
As you can see from above, the problem is that tangelo --port 9220 is being interpreted as a single argument. Split it by removing quotes:
CMD tangelo --port 9220 # this will use a shell
or use the "exec" form (preferred, given that you don't need any shell features):
CMD ["tangelo", "--port", "9220"] # this will execute tangelo directly
or even better use ENTRYPOINT + CMD:
ENTRYPOINT ["tangelo"]
CMD ["--port", "9220"] # this will execute tangelo directly
After this change, you'll still have a problem:
$ sudo docker run -ti test
...
[29/Apr/2018:02:43:39] TANGELO no such group 'nobody' to drop privileges to
Tangelo is complaining about the fact that there is no user and group named nobody inside the container. Again, there are two things you can do: add a RUN to create the nobody user and group, or run Tangelo with the -np/--no-drop-privileges option:
ENTRYPOINT ["tangelo"]
CMD ["--no-drop-privileges", "--port", "9220"]
It's fine if during the build you see intermediate containers: Docker creates them for each build step. The commands you specify in ENTRYPOINT or CMD are not executed during build, they're just recorded into the final image.
I'm trying to use docker-composeto run a python script in one container that populates a database in a separate container. My problem is that the script launches before the database is ready to accept connections. Is there a way to avoid this and still use docker-compose?
My other alternative is to create a shell script that fires each of the docker container commands serially, but i would rather use docker-compose if possible.
Here is the docker-compose.yml file:
etl:
build: ./etl
links:
- mysql
mysql:
image: mariadb
environment:
MYSQL_DATABASE: my_db
MYSQL_ROOT_PASSWORD: a_password
Here's my work-around shell script:
#!/bin/bash
docker run --name mariadb -e MYSQL_ROOT_PASSWORD=my-secret-pw -d mariadb:latest
docker build -t etl ./etl
docker run -it --rm -name my-etl --link mariadb:mysql etl
You can have the etl container sleep before it starts the script, i.e.
docker run -it --rm -name my-etl --link mariadb:mysql etl /bin/bash -c "sleep 10 && python your-script"
or in Dockerfile
CMD ["/bin/bash", "-c", "sleep 10 && python your-script"]
This way the etl container will sleep for 10 seconds before it starts the python script.
A fixed sleep 10 is one option.
Another is to have the etl code retry the connection a few times, with a shorter sleep between each attempt (~1s). When the connection is successful you know the container is ready, so the script can proceed.
By attempting the connection multiple times with a shorter sleep, you'll wait less time on average.
docker container exited immediately after python script execution:
docker run -t -i -v /root/test.py:/test.py zookeeper python test.py
(test.py starts zookeeper service )
The command is successful but exits immediately with out starting container. I could NOT start the container with "docker start container id".
Manually running "python test.py" is successful inside container but not during "docker run ...."
Just starting the server is not enough. When the CMD exits, so does the container. Thus, if you start a service that's a daemon, you need to keep your process alive. This can be achieved by, for example, tailing the service log file. supervisord is another way to run processes and keep the CMD alive.
For example, you might do
CMD /test.py && tail -F /var/log/zookeeper.log
Running from the commandline you could do something similar
docker run -t -i -v /root/test.py:/test.py zookeeper bash -c "python test.py && tail -F /var/log/zookeeper.log"