I am quite new to the whole world of Docker. Actually I am trying to establish an environment for different python machine learning applications which should act independently from each other in an own docker container. Since I dont really understood the way of using base images and trying to extend those base images I using a dockerfile for each new application which defines the different packages that I use. They all have in one thing in common - they use the command FROM python:3.6-slim as the base.
I am looking for a starting point or way that I can easily extend this base image to form a new image which contains the individual packages that each of the application needs in order to save disk space. Right now, each of the images has a file size of approx. 1gb and hopefully this could be a solution to reduce the amount.
Without going into details about the different storage backend solutions for Docker (check Docker - About Storage Drivers for reference), docker reuses all the shared intermediate points of an image.
Having said that, even though you see in the docker images output [1.17 GB, 1.17 GB, 1.17 GB, 138MB, 918MB] it does not mean that is using the sum in your storage. We can say the following:
sum(`docker images`) <= space-in-disk
Each of the steps in the Dockerfile creates a layer.
Let's take the following project structure:
├── common-requirements.txt
├── Dockerfile.1
├── Dockerfile.2
├── project1
│ ├── requirements.txt
│ └── setup.py
└── project2
├── requirements.txt
└── setup.py
With Dockerfile.1:
FROM python:3.6-slim
# - here we have a layer from python:3.6-slim -
# 1. Copy requirements and install dependencies
# we do this first because we assume that requirements.txt changes
# less than the code
COPY ./common-requirements.txt /requirements.txt
RUN pip install -r requirements
# - here we have a layer from python:3.6-slim + your own requirements-
# 2. Install your python package in project1
COPY ./project1 /code
RUN pip install -e /code
# - here we have a layer from python:3.6-slim + your own requirements
# + the code install
CMD ["my-app-exec-1"]
With Dockerfile.2:
FROM python:3.6-slim
# - here we have a layer from python:3.6-slim -
# 1. Copy requirements and install dependencies
# we do this first because we assume that requirements.txt changes
# less than the code
COPY ./common-requirements.txt /requirements.txt
RUN pip install -r requirements
# == here we have a layer from python:3.6-slim + your own requirements ==
# == both containers are going to share the layers until here ==
# 2. Install your python package in project1
COPY ./project2 /code
RUN pip install -e /code
# == here we have a layer from python:3.6-slim + your own requirements
# + the code install ==
CMD ["my-app-exec-2"]
The two docker images are going to share the layers with python and the common-requirements.txt. It is extremely useful when building application with a lot heavy libraries.
To compile, I will do:
docker build -t app-1 -f Dockerfile.1 .
docker build -t app-2 -f Dockerfile.2 .
So, think that the order how you write the steps in the Dockerfile does matter.
Related
This is a concept question relating to images and containers. So I have have a python 3.9 image on ubuntu. I have a main.py which will altered and rewritten. However I see that when I change the contents of the main.py from my host computer's files the main.py within the container will not be one to one as the contents of the main.py file will not change unless I change it within the container itself.
Is there a way to have it so that when there is an update on the host computer the files within the container will also pull the latest updates so the files would be one to one? I obviously want to alter the main.py from my host computer as the if I did the vice versa the container changes wouldn't be seeable outside it.
Host Computer:
.
├── files/
│ ├── main.py
│ └── text.csv
└── Dockerfile
Container Directory:
test-poetry/
├─tests/
│ └─__init__.py
├─README.md
├─pyproject.toml
├─test_poetry/
│ ├─__init__.py
│ ├─main.py
│ └─text.csv
└─poetry.lock
Docker Contents:
#Python - 3.9 - ubuntu
FROM python:3.9-slim
ENTRYPOINT [ "/bin/bash" ]
WORKDIR /src/test-poetry/test_poetry
COPY files .
You need to use files as a container volume.
Dockerfile:
FROM python:3.9-slim
# makes no sense to have this as a long path
WORKDIR /project
# keep this towards the end of the file for clarity
ENTRYPOINT [ "/bin/bash" ]
Then build the image and run your container with:
docker build -t test/poetry:0.1 .
docker container run --rm -ti -v $(pwd)/files:/project test/poetry:0.1
** NOTE **
For your purposes you can even skip the build completely and run a simple container like:
docker run \
--rm -ti \
-v $(pwd)/files:/project \
--workdir /project \
python:3.9-slim bash
I have a Python app that takes the value of a certificate in a Dockerfile and updates it. However, I'm having difficulty knowing how to get the app to work within Gitlab.
When I push the app with the Dockerfile to be updated I want the app to run in the Gitlab pipeline and update the Dockerfile. I'm a little stuck on how to do this. I'm thinking that I would need to pull the repo, run the app and then push back up.
Would like some advice on if this is the right approach and if so how I would go about doing so?
This is just an example of the Dockerfile to be updated (I know this image wouldn't actually work, but the app would only update the ca-certificate present in the DF:
#syntax=docker/dockerfile:1
#init the base image
FROM alpine:3.15
#define present working directory
#WORKDIR /library
#run pip to install the dependencies of the flask app
RUN apk add -u \
ca-certificates=20211220 \
git=3.10
#copy all files in our current directory into the image
COPY . /library
EXPOSE 5000
#define command to start the container, need to make app visible externally by specifying host 0.0.0.0
CMD [ "python3", "-m", "flask", "run", "--host=0.0.0.0"]
gitlab-ci.yml:
stages:
- build
- test
- update_certificate
variables:
PYTHON_IMG: "python:3.10"
pytest_installation:
image: $PYTHON_IMG
stage: build
script:
- pip install pytest
- pytest --version
python_requirements_installation:
image: $PYTHON_IMG
stage: build
script:
- pip install -r requirements.txt
unit_test:
image: $PYTHON_IMG
stage: test
script:
- pytest ./tests/test_automated_cert_checker.py
cert_updater:
image: $PYTHON_IMG
stage: update_certificate
script:
- pip install -r requirements.txt
- python3 automated_cert_updater.py
I'm aware there's a lot of repetition with installing the requirements multiple times and that this is an area for improvement. I doesn't feel like it's necessary for the app to be built into an image because it's only used for updating the DF.
requirements.txt installs pytest and BeautifulSoup4
Additional context: The pipeline that builds the Dockerimage already exists and builds successfully. I am looking for a way to run this app once a day which will check if the ca-certificate is still up to date. If it isn't then the app is run, the ca-certificate in the Dockerfile is updated and then the updated Dockerfile is re built automatically.
My thoughts are that I may need to set the gitlab-ci.yml up pull the repo, run the app (that updates the ca-certificate) and then re push it, so that a new image is built based upon the update to the certificate.
The Dockerfile shown here is just a basic example showing that the actual DF in the repo looks like.
What you probably want to do is identify the appropriate version before you build the Dockerfile. Then, pass a --build-arg with the ca-certificates version. That way, if the arg changes, then the cached layer becomes invalid and will install the new version. But if the version is the same, the cached layer would be used.
FROM alpine:3.15
ARG CA_CERT_VERSION
RUN apk add -u \
ca-certificates=$CA_CERT_VERSION \
git=3.10
# ...
Then when you build your image, you should figure out the appropriate ca-certificates version and pass it as a build-arg.
Something like:
version="$(python3 ./get-cacertversion.py)" # you implement this
docker build --build-arg CA_CERT_VERSION=$version -t myimage .
Be sure to add appropriate bits to leverage docker caching in GitLab.
I'm trying to deploy a Python app as a Docker container using Dockerfile and docker-compose.
The project structure is this:
ms-request
- src
__init__.py
- exceptions
__init__.py
ms_request_exceptions.py
- messaging
__init__.py
receive_rabbit.py
send_rabbit.py
- request
__init__.py
bsrequest.py
- test
__init__.py
test_bsrequest.py
Dockerfile
requirements.txt
In my receive_rabbit.py script, I am importing functions from the request and messaging packages like so:
from src.request import bsrequest
from src.messaging.send_rabbit import send_message
Executing this using PyCharm works fine. Running it from the command line initially didn't work, until I updated the PYTHONPATH using export PYTHONPATH=${PYTHONPATH}:..
I would like to deploy this as a Docker container, so I created a Dockerfile and an entry in my docker-compose.yml for the project.
Dockerfile:
FROM python:3
WORKDIR /bsreq
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY src/ ./src
COPY test/ ./test
RUN export PYTHONPATH=${PYTHONPATH}:.
CMD [ "python", "/bsreq/src/messaging/receive_rabbit.py" ]
docker-compose.yml:
version: "3.3"
services:
rabbitmq: [...]
bs-request:
build: ./ms-request/
depends_on:
- rabbitmq
env_file:
- rabbit.env
[...]
Running this using docker-compose up bs-request always ends in a crash with the error No module named 'src'.
I have tried multiple variations of inputs for the WORKDIR, COPY, PYTHONPATH and CMD lines in the Dockerfile. All lead to the same error. I've tried relative imports, which throw Attempted relative import with no known parent package.
I hope this is an issue others have encountered before. What do I need to do to get this deployment working?
docker layers-way building an image makes your export unusable right after the associated RUN command.
FROM python:3
WORKDIR /bsreq
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY src/ ./src
COPY test/ ./test
RUN export PYTHONPATH=${PYTHONPATH}:. #--> exporting
CMD [ "python", "/bsreq/src/messaging/receive_rabbit.py" ] #--> last export is not persistent
as a workaround you set environment variables that will persist through the build AND in the final image with ENV PYTHONPATH=${PYTHONPATH}:. command.
extra read: https://vsupalov.com/docker-build-time-env-values/
any how, suggested method is to write a setup.py file and install your package with python setup.py install so it would be installed as a package and imports would work.
P.S
a better, more updated way would be to use tools such as poetry that uses pyproject.toml (according to PEP 518, 517) which the future way of python!
bonus read: https://python-poetry.org/
good luck!
I want to create a container that is contained with two Python packages as well as a package consist of an executable file.
Here's my main package (dockerized_package) tree:
dockerized_project
├── docker-compose.yml
├── Dockerfile
├── exec_project
│ ├── config
│ │ └── config.json
│ ├── config.json
│ ├── gowebapp
├── pythonic_project1
│ ├── __main__.py
│ ├── requirements.txt
│ ├── start.sh
│ └── utility
│ └── utility.py
└── pythonic_project2
├── collect
│ ├── collector.py
├── __main__.py
├── requirements.txt
└── start.sh
Dockerfile content:
FROM ubuntu:18.04
RUN apt update
RUN apt-get install -y python3.6 python3-pip python3-dev build-essential gcc \
libsnmp-dev snmp-mibs-downloader
RUN pip3 install --upgrade pip
RUN mkdir /app
WORKDIR /app
COPY . /app
WORKDIR /app/snmp_collector
RUN pip3 install -r requirements.txt
WORKDIR /app/proto_conversion
RUN pip3 install -r requirements.txt
WORKDIR /app/pythonic_project1
CMD python3 __main__.py
WORKDIR /app/pythonic_project2
CMD python3 __main__.py
WORKDIR /app/exec_project
CMD ["./gowebapp"]
docker-compose content:
version: '3'
services:
proto_conversion:
build: .
image: pc:2.0.0
container_name: proto_conversion
# command:
# - "bash snmp_collector/start.sh"
# - "bash proto_conversion/start.sh"
restart: unless-stopped
ports:
- 8008:8008
tty: true
Problem:
When I run this project with docker-compose up --build, only the last CMD command runs. Hence, I think the previous CMD commands are killed in Dockerfile because when I remove the last two CMD, the first CMD works well.
Is there any approach to run multiple Python scripts and an executable file in the background?
I've also tried with the bash files without any success either.
As mentioned in the documentation, there can be only one CMD in the docker file and if there is more, the last one overrides the others and takes effect.
A key point of using docker might be to isolate your programs, so at first glance, you might want to move them to separate containers and talk to each other using a shared volume or a docker network, but if you really need them to run in the same container, including them in a bash script and replacing the last CMD with CMD run.sh will run them alongside each other:
#!/bin/bash
exec python3 /path/to/script1.py &
exec python3 /path/to/script2.py
Add COPY run.sh to the Dockerfile and use RUN chmod a+x run.sh to make it executable. CMD should be CMD ["./run.sh"]
try it via entrypoint.sh
ENTRYPOINT ["/docker_entrypoint.sh"]
docker_entrypoint.sh
#!/bin/bash
set -e
exec python3 not__main__.py &
exec python3 __main__.py
symbol & says that you run service as daemon in background
Best practice is to launch these as three separate containers. That's doubly true since you're taking three separate applications, bundling them into a single container, and then trying to launch three separate things from them.
Create a separate Dockerfile in each of your project subdirectories. These can be simpler, especially for the one that just contains a compiled binary
# execproject/Dockerfile
FROM ubuntu:18.04
WORKDIR /app
COPY . ./
CMD ["./gowebapp"]
Then in your docker-compose.yml file have three separate stanzas to launch the containers
version: '3'
services:
pythonic_project1:
build: ./pythonic_project1
ports:
- 8008:8008
env:
PY2_URL: 'http://pythonic_project2:8009'
GO_URL: 'http://execproject:8010'
pythonic_project2:
build: ./pythonic_project2
execproject:
build: ./execproject
If you really can't rearrange your Dockerfiles, you can at least launch three containers from the same image in the docker-compose.yml file:
services:
pythonic_project1:
build: .
workdir: /app/pythonic_project1
command: ./__main__.py
pythonic_project2:
build: .
workdir: /app/pythonic_project1
command: ./__main__.py
There's several good reasons to structure your project with multiple containers and images:
If you roll your own shell script and use background processes (as other answers have), it just won't notice if one of the processes dies; here you can use Docker's restart mechanism to restart individual containers.
If you have an update to one of the programs, you can update and restart only that single container and leave the rest intact.
If you ever use a more complex container orchestrator (Docker Swarm, Nomad, Kubernetes) the different components can run on different hosts and require a smaller block of CPU/memory resource on a single node.
If you ever use a more complex container orchestrator, you can individually scale up components that are using more CPU.
I am trying to integrate docker in to my django workflow and I have everything set up except one really annoying issue. If I want to add dependencies to my requirements.txt file I basically just have to rebuild the entire container image for those dependencies to stick.
For example, I followed the docker-compose example for django here. the yaml file is set up like this:
db:
image: postgres
web:
build: .
command: python manage.py runserver 0.0.0.0:8000
volumes:
- .:/code
ports:
- "8000:8000"
links:
- db
and the Docker file used to build the web container is set up like this:
FROM python:2.7
ENV PYTHONUNBUFFERED 1
RUN mkdir /code
WORKDIR /code
ADD requirements.txt /code/
RUN pip install -r requirements.txt
ADD . /code/
So when the image is built for this container requirements.txt is installed with whatever dependencies are initially in it.
If I am using this as my development environment it becomes very difficult to add any new dependencies to that requirements.txt file because I will have to rebuild the container for the changes in requirements.txt to be installed.
Is there some sort of best practice out there in the django community to deal with this? If not, I would say that docker looks very nice for packaging up an app once it is complete, but is not very good to use as a development environment. It takes a long time to rebuild the container so a lot of time is wasted.
I appreciate any insight . Thanks.
You could mount requirements.txt as a volume when using docker run (untested, but you get the gist):
docker run container:tag -v /code/requirements.txt ./requirements.txt
Then you could bundle a script with your container which will run pip install -r requirements.txt before starting your application, and use that as your ENTRYPOINT. I love the custom entrypoint script approach, it lets me do a little extra work without needing to make a new container.
That said, if you're changing your dependencies, you're probably changing your application and you should probably make a new container and tag it with a later version, no? :)
So I changed the yaml file to this:
db:
image: postgres
web:
build: .
command: sh startup.sh
volumes:
- .:/code
ports:
- "8000:8000"
links:
- db
I made a simple shell script startup.sh:
#!/bin/bash
#restart this script as root, if not already root
[ `whoami` = root ] || exec sudo $0 $*
pip install -r dev-requirements.txt
python manage.py runserver 0.0.0.0:8000
and then made a dev-requirements.txt that is installed by the above shell script as sort of a dependency staging environment.
when I am satisfied with a dependency in dev-requirements.txt I will just move it over to the requirements.txt to be committed to the next build of the image. This gives me flexibility to play with adding and removing dependencies while developing.
I think the best way is to ignore what's currently the most common way to install python dependencies (pip install -r requirements.txt) and specify your requirements directly in the Dockerfile, effectively getting rid of the requirements.txt file. Additionally you get dockers layer caching for free.
FROM python:2.7
ENV PYTHONUNBUFFERED 1
RUN mkdir /code
WORKDIR /code
# make sure you install requirements before the ADD, since everything after ADD is not cached
RUN pip install flask==0.10.1
RUN pip install sqlalchemy==1.0.6
...
ADD . /code/
If the docker container is the only way your application is ever run, then I would suggest you do it this way. If you want to support other means of setting up your code (e.g. virtualenv) then this is of course not for you and you should fall back to either using a requirements file or use a setup.py routine. Either way, I found this way to be most simple and straightforward without dealing with all the messed up python package distribution issues.