Minimal SciPy Dockerfile - python

I have a Dockerfile like the following, app code is omitted:
FROM python:3
# Binary dependencies
RUN apt update && apt install -y gfortran libopenblas-dev liblapack-dev
# Wanted Python packages
RUN python3 -m pip install mysqlclient numpy scipy pandas matplotlib
It works fine but produces an image of 1.75 GB in size (while code is about 50 MB). How can I reduce such huge volume??
I also tried to use Alpine Linux, like this:
FROM python:3-alpine
# Binary dependencies for numpy & scipy; though second one doesn't work anyway
RUN apk add --no-cache --virtual build-dependencies \
gfortran gcc g++ libstdc++ \
musl-dev lapack-dev freetype-dev python3-dev
# For mysqlclient
RUN apk --no-cache add mariadb-dev
# Wanted Python packages
RUN python3 -m pip install mysqlclient numpy scipy pandas matplotlib
But Alpine leads to many different strange errors. Error from the upper code:
File "scipy/odr/setup.py", line 28, in configuration
blas_info['define_macros'].extend(numpy_nodepr_api['define_macros'])
KeyError: 'define_macros'
So, how one can get minimal possible (or at least just smaller) image of Python 3 with mentioned packages?

There are several things you can do to make your Docker image smaller.
Use the python:3-slim Docker image as a base. The -slim images do not include packages needed for compiling software.
Pin the Python version, let's say to 3.8. Some packages do not have wheel files for python 3.9 yet, so you might have to compile them. It is good practice, in general, to use a more specific tag because the python:3-slim tag will point to different versions of python at different points in time.
You can also omit the installation of gfortran, libopenblas-dev, and liblapack-dev. Those packages are necessary for building numpy/scipy, but if you install the wheel files, which are pre-compiled, you do not need to compile any code.
Use --no-cache-dir in pip install to disable the cache. If you do not include this, then pip's cache counts toward the Docker image size.
There are no linux wheels for mysqlclient, so you will have to compile it. You can install build dependencies, install the package, then remove build dependencies in a single RUN instruction. Keep in mind that libmariadb3 is a runtime dependency of this package.
Here is a Dockerfile that implements the suggestions above. It makes a Docker image 354 MB large.
FROM python:3.8-slim
# Install mysqlclient (must be compiled).
RUN apt-get update -qq \
&& apt-get install --no-install-recommends --yes \
build-essential \
default-libmysqlclient-dev \
# Necessary for mysqlclient runtime. Do not remove.
libmariadb3 \
&& rm -rf /var/lib/apt/lists/* \
&& python3 -m pip install --no-cache-dir mysqlclient \
&& apt-get autoremove --purge --yes \
build-essential \
default-libmysqlclient-dev
# Install packages that do not require compilation.
RUN python3 -m pip install --no-cache-dir \
numpy scipy pandas matplotlib
Using alpine linux was a good idea, but alpine uses muslc instead of glibc, so it is not compatible with most pip wheels. The result is that you would have to compile numpy/scipy.

Related

How do I install a specific python version without pyenv within nvidia docker so that it interacts well with poetry?

I am trying to build a docker image that contains cuda, cudnn and python, each with specific versions that are templatable as a base for downstream users.
(In this example I have replace all the irrelevant templating with hard-coded versions, this is just FYI as a motivation).
Please note that the following questions are not duplicates:
How to install python in a docker image? does not involve poetry
Integrating Python Poetry with Docker Does not concern itself with installing dependencies
How do I integrate pyenv, poetry, and docker? This works for me already, I am looking for a different solution
I have achieved what I want using pyenv to install the specific python version within docker inside the nvidia image.
However, this solution is not optimal since the resulting image is about 1.5GB larger than what I think should be possible. Sidenote: I know that there are other ways to reduce the image size further that I have not done in this example. This is not the question here.
I have prepared a dummy pyproject.toml and poetry.lock to demonstrate the issue that I am currently facing:
pyproject.toml
[tool.poetry]
name = "example_project"
version = "1.0.0"
description = ""
authors = ["RunOrVeith"]
[tool.poetry.dependencies]
python = ">=3.8,<3.11"
scipy = "^1.9.3"
[build-system]
requires = ["poetry-core>=1.1.0"]
build-backend = "poetry.core.masonry.api"
Working Dockerfile.pyenv
FROM nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04 as base
ARG PYTHON_VERSION=3.8
ENV DEBIAN_FRONTEND=noninteractive
# Set-up necessary Env vars for PyEnv
ENV PYENV_ROOT /root/.pyenv
ENV PATH $PYENV_ROOT/shims:$PYENV_ROOT/bin:$PATH
ENV PATH="/root/.local/bin/:$PATH"
# Install essentials for pyenv https://github.com/pyenv/pyenv/wiki
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget ca-certificates \
curl llvm libncurses5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev mecab-ipadic-utf8 git \
&& rm -rf /var/lib/apt/lists/*
# Install pyenv
RUN set -ex \
&& curl https://pyenv.run | bash \
&& pyenv update \
&& pyenv install $PYTHON_VERSION \
&& pyenv global $PYTHON_VERSION \
&& pyenv rehash \
&& pip install --upgrade pip
# Install poetry
RUN curl -sSL https://install.python-poetry.org | python - \
&& poetry --version && poetry config virtualenvs.create false
FROM base as example # The template that I want to provide ends here, this is just for demoing the issue
WORKDIR /app
COPY pyproject.toml .
COPY poetry.lock .
RUN poetry install --no-interaction --no-ansi
The version that doesn't work Dockerfile.plain
FROM nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04 as base
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHON_VERSION=3.8
ENV PATH="/root/.local/bin/:$PATH"
RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys A4B469963BF863CC \
&& apt update \
&& apt install -y git curl \
&& apt install -y --no-install-recommends make build-essential
# Don't be confused, distutils-3.9 also installs python 3.8 https://github.com/deadsnakes/issues/issues/150
RUN apt install -y --no-install-recommends python${PYTHON_VERSION} python${PYTHON_VERSION}-dev python${PYTHON_VERSION}-distutils python${PYTHON_VERSION}-venv \
&& update-alternatives --install /usr/bin/python python /usr/bin/python${PYTHON_VERSION} 10 \
&& update-alternatives --install /usr/bin/python3 python3 /usr/bin/python${PYTHON_VERSION} 10 \
&& apt-get install -y --no-install-recommends python3-pip python3-setuptools \
&& update-alternatives --install /usr/local/bin/pip pip /usr/bin/pip 10 \
&& update-alternatives --install /usr/local/bin/pip3 pip3 /usr/bin/pip 10 \
&& apt-get clean
WORKDIR /virtualenvs
RUN curl -sSL https://install.python-poetry.org | python${PYTHON_VERSION} - \
&& poetry --version && poetry config virtualenvs.create false
FROM base as example
WORKDIR /app
COPY pyproject.toml .
COPY poetry.lock .
RUN poetry install --no-interaction --no-ansi
You can build this using
DOCKER_BUILDKIT=1 docker build -t github:example-plain --target example -f Dockerfile.plain .
and then run using
docker run -it github:example-plain bash
Here is the issue:
All the following commands are run from within the docker image.
According to poetry, everything is installed:
root#5e1ffb1f971c:/app# poetry show
Skipping virtualenv creation, as specified in config file.
numpy 1.23.4 NumPy is the fundamental package for array computing with Python.
scipy 1.9.3 Fundamental algorithms for scientific computing in Python
root#5e1ffb1f971c:/app# poetry run pip --version
Skipping virtualenv creation, as specified in config file.
pip 20.0.2 from /usr/lib/python3/dist-packages/pip (python 3.8)
However using regular pip, there is nothing, and imports also fail.
If I use poetry to import something, it also does not work.
root#5e1ffb1f971c:/app# pip --version
pip 20.0.2 from /usr/lib/python3/dist-packages/pip (python 3.8)
root#5e1ffb1f971c:/app# pip freeze
root#5e1ffb1f971c:/app# python -c "import scipy"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'scipy'
root#5e1ffb1f971c:/app# poetry run python -c "import scipy"
Skipping virtualenv creation, as specified in config file.
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'scipy'
What is also interesting is that if I upgrade pip with poetry it tells me it can't uninstall pip, I am assuming this is due to this ubuntu patch that tries to prevent me from breaking the system (even though I just install pip).
Afterwards, the poetry pip executable also points somewhere else.
root#5e1ffb1f971c:/app# poetry run pip install --upgrade pip
Skipping virtualenv creation, as specified in config file.
Collecting pip
Using cached pip-22.3.1-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 20.0.2
Not uninstalling pip at /usr/lib/python3/dist-packages, outside environment /usr
Can't uninstall 'pip'. No files were found to uninstall.
Successfully installed pip-22.3.1
root#5e1ffb1f971c:/app# poetry run pip --version
Skipping virtualenv creation, as specified in config file.
pip 22.3.1 from /usr/local/lib/python3.8/dist-packages/pip (python 3.8)
So how do I set this up so that I get a fresh python install of whichever version I configure, and it works with poetry? It is also required that the python and python3 aliases point to whatever poetry is using.
Reference with working version:
If I do the same commands with the working version using pyenv, it looks like this:
root#c0a9af7f05b4:/app# pip freeze
numpy==1.23.4
scipy==1.9.3
root#c0a9af7f05b4:/app# poetry show
Skipping virtualenv creation, as specified in config file.
numpy 1.23.4 NumPy is the fundamental package for array computing with Python.
scipy 1.9.3 Fundamental algorithms for scientific computing in Python
root#c0a9af7f05b4:/app# poetry run pip --version
Skipping virtualenv creation, as specified in config file.
pip 22.3.1 from /root/.pyenv/versions/3.8.15/lib/python3.8/site-packages/pip (python 3.8)
root#c0a9af7f05b4:/app# pip --version
pip 22.3.1 from /root/.pyenv/versions/3.8.15/lib/python3.8/site-packages/pip (python 3.8)

Problem building docker with numpy and pandas over arm64

I'm trying to build a docker image with docker-compose in my ARM64 rasperry pi but it seems to be imposible.
This is my dockerfile:
FROM python:3.6-slim
RUN apt-get update && apt-get -y install python3-dev
RUN apt-get -y install python3-numpy
RUN apt-get -y install python3-pandas
ENTRYPOINT ["python3", "app.py"]
It seems to be OK, but when app.py is run, it gives an error: "Module numpy not found", and the same for pandas module.
If I try to install numpy and pandas using pip:
RUN pip install numpy pandas
It gives me an error or, more usually, the raspberry just gets frozen and I have to unplug it to recover.
I have tried with different versions of python for the source image and also using several ubuntu images and installing python.
Any idea of how can I install numpy and pandas in docker for my raspberry pi (ARM64)?
Thanks
The problems seems to be with the python version. I'm using a python3.6 docker image but, both python3-numpy and python3-pandas packages require python3.5, so when those packages are installed a new version of python is also installed. This is why when I'm trying to import those modules the python interpreter can't found them, because they are installed for another python version.
Finaly I solved it using a generic docker debian image and installing python3.5 myself instead of using a python docker image.
FROM debian:stretch-slim
RUN apt-get update && apt-get -y dist-upgrade
RUN apt-get -y install build-essential libssl-dev libffi-dev python3.5 libblas3 libc6 liblapack3 gcc python3-dev python3-pip cython3
RUN apt-get -y install python3-numpy python3-sklearn
RUN apt-get -y install python3-pandas
COPY requirements.txt /tmp/
RUN pip3 install -r /tmp/requirements.txt
(Disclaimer: The Raspberry Pi 3 B+ is probably too slow to install big dependecies like numpy)
This Dockerfile worked for me on the Raspberry Pi 3 B+ with Software-Version: Linux raspberrypi 5.10.63-v7+ (Consider updating it)
FROM python:3.9-buster
WORKDIR /
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt
I am not sure, but I think it helped also to clean docker i.e. remove all images and containers with the following commands:
Warning: This commands deletes all images and containers!
$ docker container prune
$ docker image prune -a
Or reset Docker completely (deletes also volumes and networks):
$ docker system prune --volumes
I recommend to create requirements.txt file.
Inside you can declare packets to install.
The `Dockerfile':
FROM python
COPY app.py /workdir/
COPY requirements.txt /workdir/
WORKDIR /workdir
RUN pip install --trusted-host pypi.python.org -r requirements.txt
CMD python app.py
edit
I create Dockerfile which import pandas lib and then checking if it work:
cat Dockerfile
FROM python
COPY app.py /workdir/
WORKDIR /workdir
RUN python -m pip install pandas
CMD python app.py

Cannot load CLoader with pyyaml

I'm working on a python project using pyyaml. I need to run it in a Docker container based on bitnami/minideb:jessie. Python version is 2.7.9.
The original code is using CLoader and I cannot change it currently.
Any reason CLoader fails to load but Loader is fine ?
>>> import yaml
>>> yaml.__version__
'3.12'
>>> from yaml import Loader
>>> from yaml import CLoader
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name CLoader
>>>
I cannot figure out what I'm missing here. Any idea ?
Running it from the Docker image python:2.7.9 does not raise any error then:
$ docker run -ti python:2.7.9 bash
#/ python
>>> from yaml import CLoader
>>> from yaml import Loader
>>>
By default, the setup.py script checks whether LibYAML is installed
and if so, builds and installs LibYAML bindings.
This is the minimum to get CLoader compiled and installed.
FROM ubuntu:20.04
RUN apt-get update && apt-get install -y \
python3 python3-dev python3-pip gcc libyaml-dev
RUN pip3 install pyyaml
# verify
RUN python3 -c "import yaml; yaml.CLoader"
I ran into the same problem. You need to install the libyaml-dev package, then install libyaml and pyyaml from source. Here's the complete Dockerfile for minideb:jessie:
FROM bitnami/minideb:jessie
RUN apt-get update
RUN apt-get install -y \
automake \
autoconf \
build-essential \
git-core \
libtool \
libyaml-dev \
make \
python \
python-dev \
python-pip
RUN pip install --upgrade pip
RUN pip install Cython==0.29.10
RUN mkdir /libyaml
WORKDIR /libyaml
RUN git clone https://github.com/yaml/libyaml.git . && \
git checkout dist-0.2.2 && \
autoreconf -f -i && \
./configure && \
make && \
make install
RUN mkdir /pyyaml
WORKDIR /pyyaml
RUN git clone https://github.com/yaml/pyyaml.git . && \
git checkout 5.1.1 && \
python setup.py install
RUN python -c "import yaml; from yaml import CLoader; print 'Loaded CLoader!'"
A couple of additions to others' solutions:
If you want the install command to hard-fail if the libyaml C extension won't build (instead of silently falling back to a pure-Python only install), you can pass the --with-libyaml global option, eg: python setup.py --with-libyaml install.
If you're doing this with something that might ever need to be upgraded (eg implicitly via another package's requirement for a higher pyyaml version), it's better to use pip instead of directly calling setup.py, as that (currently) uses a pure distutils installation, which pip will fail to uninstall later. You'll see an error like "ERROR: Cannot uninstall 'PyYAML'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall."
Doing the required extension build with pip looks something like pip install --global-option='--with-libyaml' pyyaml.
I'm just copying the developer's answer from the issue linked above, but this happens because pyyaml only installs the libyaml bindings (CLoader & co.) if it finds the libyaml-dev package (that's the debian package, anyway) at install time. If it doesn't find it, it prints a warning and skips the libyaml bindings.
So, install libyaml-dev before installing pyyaml.
I tried all the step mentions, and the following steps fixed my issue.
Install
apt-get install -y gcc libyaml-dev
pip install --ignore-installed --global-option='--with-libyaml' pyyaml
Test
python -c "import yaml; yaml.CLoader"

pip install letencrypt, how do I know which packages are needed?

I'm reading this dockerfile for letsencrypt on Alpine:
https://github.com/CognitiveScale/lets-alpine/blob/master/Dockerfile
As I know, if I instlalled just pip with apk, or even apk-get on ubuntu, shouldn't the package manager also download any other needed libraries for pip to work? Why does this list of libs must be typed in the dockerfile?
RUN apk add --update \
python python-dev py-pip \
gcc musl-dev linux-headers \
augeas-dev openssl-dev libffi-dev ca-certificates dialog \
&& rm -rf /var/cache/apk/*
I'm asking this because, what if I want to create images based on alpine, how am I going to know all the needed libs?
These Alpine packages are not needed for pip itself, presumably they are needed to build the Python modules that you will install with pip later.
You need to read module descriptions to determine their dependencies. Alternatively, you can follow the "trial and error" route and add the required Alpine packages when some Python modules fail to build.

Installing XGBoost

I am trying use the XGBoost package, but I am having trouble installing it. I am following the installation guide
here
https://xgboost.readthedocs.io/en/latest/build.html#python-package-installation. I have successfully built xgboost for OSX using
git clone --recursive https://github.com/dmlc/xgboost
cd xgboost; cp make/minimum.mk ./config.mk; make -j4
However, when I try to install the python package in my terminal using this code
cd python-package; sudo python setup.py install
I get the error python: command not found. I am not sure why I get this error because I have python installed and I can run ipython notebooks. Python is install here on my computer /usr/local/Cellar/python/2.7.8/Frameworks/Python.framework/Versions/2.7/lib/python2.7. Do I need to add a path in my bash_profile to access it? I don't understand why I can't use python from the command line.
I have answered similar issue in this question. You can install xgboost library along with other essential libraries as follows(please choose based on the libraries sufficient for your project), my main focus in this answer is to make it helpful in setting up for most data science projects requiring sklearn, pandas, scipy and xgboost algorithms along with visualization libraries.
# installing essentials
apt-get update; \
apt-get install -y \
python python-pip \
build-essential \
python-dev \
python-setuptools \
python-matplotlib \
libatlas-dev \
curl \
libatlas3gf-base && \
apt-get clean
# upgrading pip
curl -O https://bootstrap.pypa.io/get-pip.py && \
python get-pip.py && \
rm get-pip.py
# installing libraries
pip install numpy==1.13.1
pip install scipy
pip install -U scikit-learn
pip install seaborn
pip install --pre xgboost
If you're still having environment issues I would suggest using this Dockerfile. You might also find Datmo conversion useful to facilitate this.
DISCLAIMER: I work at this company called Datmo, which is building a community of developers by simplifying the machine learning workflow.
If you have python in your /usr/bin/ directory, all you need to do is to add that directory to your path.
Add this line to your .bash_profile and restart your shell.
export PATH="$PATH:/usr/bin"
Then you should be able to use any of the python versions in your /usr/bin directory. python, python3 etc. Hope this helps.

Categories