Minimize docker image with python and R - python

I have the following Dockerfile:
FROM ubuntu:latest
RUN apt-get update \
&& apt-get install -y python3-pip python3-dev \
&& cd /usr/local/bin \
&& ln -s /usr/bin/python3 python \
&& pip3 install --upgrade pip
# Setup the Python's configs
RUN pip install --upgrade pip && \
pip install --no-cache-dir matplotlib==3.0.2 pandas==0.23.4 numpy==1.16.3 && \
pip install --no-cache-dir pybase64 && \
pip install --no-cache-dir scipy && \
pip install --no-cache-dir dask[complete] && \
pip install --no-cache-dir dash==1.6.1 dash-core-components==1.5.1 dash-bootstrap-components==0.7.1 dash-html-components==1.0.2 dash-table==4.5.1 dash-daq==0.2.2 && \
pip install --no-cache-dir plotly && \
pip install --no-cache-dir adjustText && \
pip install --no-cache-dir networkx && \
pip install --no-cache-dir scikit-learn && \
pip install --no-cache-dir tzlocal
# Setup the R configs
RUN apt-get update
RUN apt-get install -y software-properties-common
RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
RUN add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/'
RUN apt update
ENV DEBIAN_FRONTEND=noninteractive
RUN apt install -y r-base
RUN pip install rpy2==2.9.4
RUN apt-get -y install libxml2 libxml2-dev libcurl4-gnutls-dev libssl-dev
RUN echo "r <- getOption('repos'); r['CRAN'] <- 'https://cran.r-project.org'; options(repos = r);" > ~/.Rprofile
RUN Rscript -e "install.packages('BiocManager')"
RUN Rscript -e "BiocManager::install('ggplot2')"
RUN Rscript -e "BiocManager::install('DESeq2')"
RUN Rscript -e "BiocManager::install('RColorBrewer')"
RUN Rscript -e "BiocManager::install('ggrepel')"
RUN Rscript -e "BiocManager::install('factoextra')"
RUN Rscript -e "BiocManager::install('FactoMineR')"
RUN Rscript -e "BiocManager::install('apeglm')"
WORKDIR /
# Copy all the necessary files of the app to the container
COPY ./ ./
# Install the slider-input component
WORKDIR /slider_input
RUN pip install --no-cache-dir slider_input-0.0.1.tar.gz
WORKDIR /
EXPOSE 8050
# Launch the app
CMD ["python", "./app.py"]
It's used for running dash app that using R commands, and it works fine.
The problem is the size of the image.
I want to minimize the size of the image as minimal as possible, but everything I tried was unsuccessful because of the combination of python and R.
Do you have any idea how can I minimize this image, and provide the same functionality?

Use docker-slim to minimize and secure your docker images. docker-slim will profile your docker image and throw away what you don't need.
It has been used with Node.js, Python, Ruby, Java, Golang, Rust, Elixir and PHP (some app types) running on Ubuntu, Debian, CentOS, Alpine and even Distroless.
docker-slim is production ready, but consider testing your container before deploying it to production. Minify docker images by up to 30x while making it secure too!

A multi-stage build will allow you to omit the compiler toolchain, headers, etc.. from the final image, only including the resulting code.
A three-part tutorial for Python specifically starts here: https://pythonspeed.com/articles/smaller-python-docker-images/
And the generic Docker docs: https://docs.docker.com/develop/develop-images/multistage-build/

Related

Install local package through a Dockerfile

I have started learning Docker and I have developed a Python package (not published anywhere, it is just used internally) that installs and works fine locally (here I will call it mypackage). However, when trying to install it in a Docker container, Python in the container fails to recognise it even though during the build of the image no error was raised. The Dockerfile looks like this:
# install Ubuntu 20.04
FROM ubuntu:20.04
# update Ubuntu packages
ARG DEBIAN_FRONTEND=noninteractive
RUN apt update
RUN apt upgrade -y
RUN apt install -y apt-utils \
build-essential \
curl \
mysql-server \
libmysqlclient-dev \
libffi-dev \
libssl-dev \
libxml2-dev \
libxslt1-dev \
unzip \
zlib1g-dev
# install Python 3.9
RUN apt-get install -y software-properties-common gcc && \
add-apt-repository -y ppa:deadsnakes/ppa
RUN apt-get update && apt-get install -y python3.9 python3.9-dev python3.9-distutils python3-pip python3-apt python3.9-venv
# make symlink (overriding default Python3.8 installed with Ubuntu)
RUN rm /usr/bin/python3
RUN ln -s /usr/bin/python3.9 /usr/bin/python3
# copy package files and source code
RUN mkdir mypackage
COPY pyproject.toml setup.cfg setup.py requirements.txt ./mypackage/
COPY src mypackage/src/
# add path
ENV PACKAGE_PATH=/mypackage/
ENV PATH="$PACKAGE_PATH/:$PATH"
# install mypackage
RUN pip3 install -e ./mypackage
CMD ["python3.9", "main.py"]
So the above runs successfully, but if I run sudo docker run -it test_image bin/bash and run pip3 list, the package will not be there and a ModuleNotFoundError when running code depending on mypackage. Interestingly if I create a virtual environment by replacing this:
ENV PACKAGE_PATH=/mypackage/
ENV PATH="$PACKAGE_PATH/:$PATH"
by this:
ENV VIRTUAL_ENV=/opt/venv
RUN python3.9 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
it works. Ideally, I want to know why I need to create a virtual environment and how can I run local packages in a container without creating virtual environments.

How to set default python3 to py3.8 in the Dockerfile?

I tried to alias python3 to python3.8 in the Dockerfile. But It doesn't work for me. I am using ubuntu:18.04.
Step 25/41 : RUN apt-get update && apt-get install -y python3.8
---> Using cache
---> 9fa81ca14a53
Step 26/41 : RUN alias python3="python3.8" && python3 --version
---> Running in d7232d3c8b8f
Python 3.6.9
As you can see the python3 is still 3.6.9. How can I fix this issue?
Thanks.
EDIT
Just attach my Dockerfile:
##################################################################################################################
# Build
#################################################################################################################
#FROM openjdk:8
FROM ubuntu:18.04
############## Linux and perl packages ###############
RUN apt-get update && \
apt-get install -y openjdk-8-jdk && \
apt-get install -y ant && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
rm -rf /var/cache/oracle-jdk8-installer && \
apt-get update -y && \
apt-get install curl groff python-gdbm -y;
# Fix certificate issues, found as of
# https://bugs.launchpad.net/ubuntu/+source/ca-certificates-java/+bug/983302
RUN apt-get update && \
apt-get install -y ca-certificates-java && \
apt-get clean && \
update-ca-certificates -f && \
rm -rf /var/lib/apt/lists/* && \
rm -rf /var/cache/oracle-jdk8-installer;
# Setup JAVA_HOME, this is useful for docker commandline
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/
RUN export JAVA_HOME
# install git
RUN apt-get update && \
apt-get install -y mysql-server && \
apt-get install -y uuid-runtime git jq python python-dev python-pip python-virtualenv libdbd-mysql-perl && \
rm -rf /var/lib/apt/lists/* && \
apt-get install perl && \
perl -MCPAN -e 'CPAN::Shell->install("Inline")' && \
perl -MCPAN -e 'CPAN::Shell->install("DBI")' && \
perl -MCPAN -e 'CPAN::Shell->install("List::MoreUtils")' && \
perl -MCPAN -e 'CPAN::Shell->install("Inline::Python")' && \
perl -MCPAN -e 'CPAN::Shell->install("LWP::Simple")' && \
perl -MCPAN -e 'CPAN::Shell->install("JSON")' && \
perl -MCPAN -e 'CPAN::Shell->install("LWP::Protocol::https")';
RUN apt-get update && \
apt-get install --yes cpanminus
RUN cpanm \
CPAN::Meta \
YAML \
DBI \
Digest::SHA \
Module::Build \
Test::Most \
Test::Weaken \
Test::Memory::Cycle \
Clone
# Install perl modules for network and SSL (and their dependencies)
RUN apt-get install --yes \
openssl \
libssl-dev \
liblwp-protocol-https-perl
RUN cpanm \
LWP \
LWP::Protocol::https
# New module for v1.2 annotation
RUN perl -MCPAN -e 'CPAN::Shell->install("Text::NSP::Measures::2D::Fisher::twotailed")'
#############################################
############## python packages ###############
# python packages
RUN pip install pymysql==0.10.1 awscli boto3 pandas docopt fastnumbers tqdm pygr
############## python3 packages ###############
# python3 packages
RUN apt-get update && \
apt-get install -y python3-pip && \
python3 -m pip install numpy && \
python3 -m pip install pandas && \
python3 -m pip install sqlalchemy && \
python3 -m pip install boto3 && \
python3 -m pip install pymysql && \
python3 -m pip install pymongo;
RUN python3 -m pip install pyfaidx
#############################################
#############################################
############# expose tcp ports
EXPOSE 3306/tcp
EXPOSE 80/tcp
EXPOSE 8080
############# RUN entrypoint.sh
# commented out for testing
ENTRYPOINT ["./entrypoint.sh"]
© 2022 GitHub, Inc.
Terms
When I install the package pyfaidx with default python3.6, it raises an error. I found that python3.8 can install it. Thus, I want to switch to python3.8 to install all py3 packages.
Bash alias that you define in your RUN statement will be available only in the current shell session. When the current RUN statement finishes executing, you exit the session, effectively forgetting any aliases you set up there.
See also: How can I set Bash aliases for docker containers in Dockerfile?
Another option is to use update-alternatives, e.g.,
# update-alternatives --install `which python3` python3 `which python3.8` 20
update-alternatives: using /usr/bin/python3.8 to provide /usr/bin/python3 (python3) in auto mode
# python3 --version
Python 3.8.0
This may interfere with other container packages that do require 3.6 which was default on Ubuntu 18.04 back in the day. Furthermore, pip's authors do not recommend using pip to install system-wide packages like that. In fact, newer pip versions will emit a warning when attempting to use pip globally along the lines of your Dockerfile.
Therefore a better course of action is using a virtualenv:
# apt install -y python3-venv python3.8-venv
...
# python3.8 -m venv /usr/local/venv
# /usr/local/venv/bin/pip install -U pip setuptools wheel
# /usr/local/venv/bin/pip install -U pyfaidx
... (etc)
You can also "enter" your virtualenv by activating it:
root#a1d0210118a8:/# source /usr/local/venv/bin/activate
(venv) root#a1d0210118a8:/# python -V
Python 3.8.0
See also: Use different Python version with virtualenv.

Can't install pip and python & ansible using Dockerfile in CentOS

I am trying to install python and pip & Ansible using Dockerfile but I get this error
/bin/sh: 1: python: not found
The command '/bin/sh -c curl -O https://bootstrap.pypa.io/pip/2.7/get-pip.py && python get-pip.py && python -m pip install --upgrade "pip < 21.0" && pip install ansible --upgrade' returned a non-zero code: 127
ERROR: Service 'jenkins' failed to build : Build failed
Here is my Dockerfile:
FROM jenkins/jenkins
USER root
RUN curl -O https://bootstrap.pypa.io/pip/2.7/get-pip.py && \
python get-pip.py && \
python -m pip install --upgrade "pip < 21.0" && \
pip install ansible --upgrade
USER jenkins
Note: I used the same instructions on another Dockerfile and it went without errors. Here is the Dockerfile from CentOS image:
FROM centos:7
RUN yum update -y && \
yum -y install openssh-server && \
yum install -y passwd
RUN useradd remote_user && \
echo "password" | passwd remote_user --stdin && \
mkdir /home/remote_user/.ssh && \
chmod 700 /home/remote_user/.ssh
COPY remote-key.pub /home/remote_user/.ssh/authorized_keys
RUN chown remote_user:remote_user -R /home/remote_user && \
chmod 600 /home/remote_user/.ssh/authorized_keys
RUN /usr/sbin/sshd-keygen
RUN yum -y install mysql
RUN curl -O https://bootstrap.pypa.io/pip/2.7/get-pip.py && \
python get-pip.py && \
python -m pip install --upgrade "pip < 21.0" && \
pip install awscli --upgrade
CMD /usr/sbin/sshd -D
Since I'm not entirely sure my comments were fully understandable, here is how I would install ansible in your current base image jenkins/jenkins.
Notes:
I fixed the tag to lts since building from latest is a bit on the edge. You can change that to whatever tag suits your needs.
That base image is itself based on Ubuntu and not CentOS as reported in your title (hence using apt and not yum/dnf)
I used two RUN directives (one for installing python, the other for ansible) but you can merge them in a single instruction if you want to further limit the number of layers.
FROM jenkins/jenkins:lts
USER root
RUN apt-get update && \
apt-get install -y python3-pip && \
rm -rf /var/lib/apt/lists/*
RUN pip install --upgrade pip && \
pip install ansible && \
pip cache purge
USER jenkins
I deleted RUN instructions and replaced it with :
RUN apt-get update
RUN apt-get install -y ansible
Worked like a charm.

Python creates Folder inside docker image but remove when processing completes

Python Program does create folder and put some files over there. But when i try to run the program inside docker via CMD
It creates the folder and put files over there and upon completion, the folder somehow gets removed or doesnt show inside the docker image.
I have tried the following things:
Check Folder Exist after creating - It shows folder created over there.
Check inside the docker image using bash - It doesnt show the folder and contents.
The dockerfile is
FROM ubuntu:18.04
# Upgrade installed packages
RUN apt update
RUN apt upgrade -y
ENV TZ=Europe/London
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
RUN apt-get install -y libreadline-gplv2-dev libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev
WORKDIR /code
RUN apt-get -y install python3-pip
RUN apt-get -y install python3-venv
RUN apt -y install python3-setuptools libffi-dev python3-dev
RUN apt install -y curl
RUN apt install -y unzip
RUN apt-get install -y build-essential swig
WORKDIR /code
RUN python3 -m venv .env
RUN . .env/bin/activate && pip install --upgrade pip && curl https://raw.githubusercontent.com/automl/auto-sklearn/master/requirements.txt | LC_ALL=C.UTF-8 xargs -n 1 -L 1 pip install
COPY requirements.txt requirements.txt
RUN . .env/bin/activate && pip install pyenchant && pip install -r requirements.txt
RUN apt install -y libgl1-mesa-glx
RUN apt-get install -y libglib2.0-0
RUN apt-get install -y libenchant1c2a
RUN mkdir embeddings
COPY . .
RUN curl -L http://nlp.stanford.edu/data/glove.6B.zip --output glove.zip
RUN unzip -o glove.zip -d embeddings/
RUN . .env/bin/activate && python nltk_install.py
CMD . .env/bin/activate && python main.py
Changes to filesystem are not stored in docker image. They exist in container created from an image but if you use 'docker run' command a new container is created.

python3 mayavi in docker not installing

I am trying to get mayavi working inside a docker container and originally I was starting my Dockerfile from continuumio/anaconda3. I did a "conda install mayavi" it would appear to install but as soon as I tried to import it or vtk for that matter I would get:
"ModuleNotFoundError: No module named 'vtkRenderingOpenGL2Python'"
When I try installing it from pip3 it fails to install with "ModuleNotFoundError: No module named 'vtkOpenGLKitPython'"
I have tried it starting from centos:7 and get the same issues. I guess its worth mentioning that a conda search or pip search of these modules comes up blank. However I can install it outside of docker and everything goes fine.
If it helps, my current Dockerfile looks like:
FROM centos:7
RUN yum install vim -y
RUN yum install python3 -y
RUN yum install python3-pip -y
RUN yum install python3-devel -y
RUN yum install gcc -y
#RUN pip3 install mayavi
#RUN pip3 install PyQt5
RUN mkdir /home/working
WORKDIR /home/working
I have been at this for some time now and any help would be appreciated.
you can take a look at my binder repo fork in which you can load inline mayavi in jupyter notebooks.
Pasting the Dockerfile here for posterity:
FROM jupyter/minimal-notebook:65761486d5d3
MAINTAINER Jean-Remi King <jeanremi.king#gmail.com>
# Install core debian packages
USER root
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get -yq dist-upgrade \
&& apt-get install -yq --no-install-recommends \
openssh-client \
vim \
curl \
gcc \
&& apt-get clean
# Xvfb
RUN apt-get install -yq --no-install-recommends \
xvfb \
x11-utils \
libx11-dev \
qt5-default \
&& apt-get clean
ENV DISPLAY=:99
# Switch to notebook user
USER $NB_UID
# Upgrade the package managers
RUN pip install --upgrade pip
RUN npm i npm#latest -g
# Install Python packages
RUN pip install vtk && \
pip install boto && \
pip install h5py && \
pip install nose && \
pip install ipyevents && \
pip install ipywidgets && \
pip install mayavi && \
pip install nibabel && \
pip install numpy && \
pip install pillow && \
pip install pyqt5 && \
pip install scikit-learn && \
pip install scipy && \
pip install xvfbwrapper && \
pip install https://github.com/nipy/PySurfer/archive/master.zip
# Install Jupyter notebook extensions
RUN pip install RISE && \
jupyter nbextension install rise --py --sys-prefix && \
jupyter nbextension enable rise --py --sys-prefix && \
jupyter nbextension install mayavi --py --sys-prefix && \
jupyter nbextension enable mayavi --py --sys-prefix && \
npm cache clean --force
# Try to decrease initial IPython kernel load times
RUN ipython -c "import matplotlib.pyplot as plt; print(plt)"
# Add an x-server to the entrypoint. This is needed by Mayavi
ENTRYPOINT ["tini", "-g", "--", "xvfb-run"]

Categories