Alpine Docker image with pandas installed on Pipenv - python

We need an alpine based docker image that can have pandas package within pipenv
This works.
FROM python:3-alpine
RUN apk add g++ && \
pip install numpy
But, our process needs the install on pipenv and below fails with error pipenv not found
FROM python:3-alpine
RUN apk add g++ && \
pipenv install numpy
Note pipenv is installed in earlier docker statements. However, even the below fails, with pipenv not found
FROM python:3-alpine
RUN apk add g++ && \
pip install --user pipenv && \
pipenv install numpy
Any suggestions?

pipenv isn't available because pip install --user pipenv installs it in /root/.local/bin, which isn't listed in the search path ($PATH). The easiest way to fix it would be to install pipenv without the --user flag. It will then be installed in /usr/local/bin/:
FROM python:3-alpine
RUN apk add g++ && \
pip install pipenv && \
pipenv install numpy
If you run through the build steps manually, it gives you a warning about this:
docker run --rm -ti python:3-alpine /bin/sh
apk add g++
pip install --user pipenv this shows the warning below:
WARNING: The scripts pipenv and pipenv-resolver are installed in '/root/.local/bin' which is not on PATH.

Related

Install local package through a Dockerfile

I have started learning Docker and I have developed a Python package (not published anywhere, it is just used internally) that installs and works fine locally (here I will call it mypackage). However, when trying to install it in a Docker container, Python in the container fails to recognise it even though during the build of the image no error was raised. The Dockerfile looks like this:
# install Ubuntu 20.04
FROM ubuntu:20.04
# update Ubuntu packages
ARG DEBIAN_FRONTEND=noninteractive
RUN apt update
RUN apt upgrade -y
RUN apt install -y apt-utils \
build-essential \
curl \
mysql-server \
libmysqlclient-dev \
libffi-dev \
libssl-dev \
libxml2-dev \
libxslt1-dev \
unzip \
zlib1g-dev
# install Python 3.9
RUN apt-get install -y software-properties-common gcc && \
add-apt-repository -y ppa:deadsnakes/ppa
RUN apt-get update && apt-get install -y python3.9 python3.9-dev python3.9-distutils python3-pip python3-apt python3.9-venv
# make symlink (overriding default Python3.8 installed with Ubuntu)
RUN rm /usr/bin/python3
RUN ln -s /usr/bin/python3.9 /usr/bin/python3
# copy package files and source code
RUN mkdir mypackage
COPY pyproject.toml setup.cfg setup.py requirements.txt ./mypackage/
COPY src mypackage/src/
# add path
ENV PACKAGE_PATH=/mypackage/
ENV PATH="$PACKAGE_PATH/:$PATH"
# install mypackage
RUN pip3 install -e ./mypackage
CMD ["python3.9", "main.py"]
So the above runs successfully, but if I run sudo docker run -it test_image bin/bash and run pip3 list, the package will not be there and a ModuleNotFoundError when running code depending on mypackage. Interestingly if I create a virtual environment by replacing this:
ENV PACKAGE_PATH=/mypackage/
ENV PATH="$PACKAGE_PATH/:$PATH"
by this:
ENV VIRTUAL_ENV=/opt/venv
RUN python3.9 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
it works. Ideally, I want to know why I need to create a virtual environment and how can I run local packages in a container without creating virtual environments.

How do I install a specific python version without pyenv within nvidia docker so that it interacts well with poetry?

I am trying to build a docker image that contains cuda, cudnn and python, each with specific versions that are templatable as a base for downstream users.
(In this example I have replace all the irrelevant templating with hard-coded versions, this is just FYI as a motivation).
Please note that the following questions are not duplicates:
How to install python in a docker image? does not involve poetry
Integrating Python Poetry with Docker Does not concern itself with installing dependencies
How do I integrate pyenv, poetry, and docker? This works for me already, I am looking for a different solution
I have achieved what I want using pyenv to install the specific python version within docker inside the nvidia image.
However, this solution is not optimal since the resulting image is about 1.5GB larger than what I think should be possible. Sidenote: I know that there are other ways to reduce the image size further that I have not done in this example. This is not the question here.
I have prepared a dummy pyproject.toml and poetry.lock to demonstrate the issue that I am currently facing:
pyproject.toml
[tool.poetry]
name = "example_project"
version = "1.0.0"
description = ""
authors = ["RunOrVeith"]
[tool.poetry.dependencies]
python = ">=3.8,<3.11"
scipy = "^1.9.3"
[build-system]
requires = ["poetry-core>=1.1.0"]
build-backend = "poetry.core.masonry.api"
Working Dockerfile.pyenv
FROM nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04 as base
ARG PYTHON_VERSION=3.8
ENV DEBIAN_FRONTEND=noninteractive
# Set-up necessary Env vars for PyEnv
ENV PYENV_ROOT /root/.pyenv
ENV PATH $PYENV_ROOT/shims:$PYENV_ROOT/bin:$PATH
ENV PATH="/root/.local/bin/:$PATH"
# Install essentials for pyenv https://github.com/pyenv/pyenv/wiki
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget ca-certificates \
curl llvm libncurses5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev mecab-ipadic-utf8 git \
&& rm -rf /var/lib/apt/lists/*
# Install pyenv
RUN set -ex \
&& curl https://pyenv.run | bash \
&& pyenv update \
&& pyenv install $PYTHON_VERSION \
&& pyenv global $PYTHON_VERSION \
&& pyenv rehash \
&& pip install --upgrade pip
# Install poetry
RUN curl -sSL https://install.python-poetry.org | python - \
&& poetry --version && poetry config virtualenvs.create false
FROM base as example # The template that I want to provide ends here, this is just for demoing the issue
WORKDIR /app
COPY pyproject.toml .
COPY poetry.lock .
RUN poetry install --no-interaction --no-ansi
The version that doesn't work Dockerfile.plain
FROM nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04 as base
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHON_VERSION=3.8
ENV PATH="/root/.local/bin/:$PATH"
RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys A4B469963BF863CC \
&& apt update \
&& apt install -y git curl \
&& apt install -y --no-install-recommends make build-essential
# Don't be confused, distutils-3.9 also installs python 3.8 https://github.com/deadsnakes/issues/issues/150
RUN apt install -y --no-install-recommends python${PYTHON_VERSION} python${PYTHON_VERSION}-dev python${PYTHON_VERSION}-distutils python${PYTHON_VERSION}-venv \
&& update-alternatives --install /usr/bin/python python /usr/bin/python${PYTHON_VERSION} 10 \
&& update-alternatives --install /usr/bin/python3 python3 /usr/bin/python${PYTHON_VERSION} 10 \
&& apt-get install -y --no-install-recommends python3-pip python3-setuptools \
&& update-alternatives --install /usr/local/bin/pip pip /usr/bin/pip 10 \
&& update-alternatives --install /usr/local/bin/pip3 pip3 /usr/bin/pip 10 \
&& apt-get clean
WORKDIR /virtualenvs
RUN curl -sSL https://install.python-poetry.org | python${PYTHON_VERSION} - \
&& poetry --version && poetry config virtualenvs.create false
FROM base as example
WORKDIR /app
COPY pyproject.toml .
COPY poetry.lock .
RUN poetry install --no-interaction --no-ansi
You can build this using
DOCKER_BUILDKIT=1 docker build -t github:example-plain --target example -f Dockerfile.plain .
and then run using
docker run -it github:example-plain bash
Here is the issue:
All the following commands are run from within the docker image.
According to poetry, everything is installed:
root#5e1ffb1f971c:/app# poetry show
Skipping virtualenv creation, as specified in config file.
numpy 1.23.4 NumPy is the fundamental package for array computing with Python.
scipy 1.9.3 Fundamental algorithms for scientific computing in Python
root#5e1ffb1f971c:/app# poetry run pip --version
Skipping virtualenv creation, as specified in config file.
pip 20.0.2 from /usr/lib/python3/dist-packages/pip (python 3.8)
However using regular pip, there is nothing, and imports also fail.
If I use poetry to import something, it also does not work.
root#5e1ffb1f971c:/app# pip --version
pip 20.0.2 from /usr/lib/python3/dist-packages/pip (python 3.8)
root#5e1ffb1f971c:/app# pip freeze
root#5e1ffb1f971c:/app# python -c "import scipy"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'scipy'
root#5e1ffb1f971c:/app# poetry run python -c "import scipy"
Skipping virtualenv creation, as specified in config file.
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'scipy'
What is also interesting is that if I upgrade pip with poetry it tells me it can't uninstall pip, I am assuming this is due to this ubuntu patch that tries to prevent me from breaking the system (even though I just install pip).
Afterwards, the poetry pip executable also points somewhere else.
root#5e1ffb1f971c:/app# poetry run pip install --upgrade pip
Skipping virtualenv creation, as specified in config file.
Collecting pip
Using cached pip-22.3.1-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 20.0.2
Not uninstalling pip at /usr/lib/python3/dist-packages, outside environment /usr
Can't uninstall 'pip'. No files were found to uninstall.
Successfully installed pip-22.3.1
root#5e1ffb1f971c:/app# poetry run pip --version
Skipping virtualenv creation, as specified in config file.
pip 22.3.1 from /usr/local/lib/python3.8/dist-packages/pip (python 3.8)
So how do I set this up so that I get a fresh python install of whichever version I configure, and it works with poetry? It is also required that the python and python3 aliases point to whatever poetry is using.
Reference with working version:
If I do the same commands with the working version using pyenv, it looks like this:
root#c0a9af7f05b4:/app# pip freeze
numpy==1.23.4
scipy==1.9.3
root#c0a9af7f05b4:/app# poetry show
Skipping virtualenv creation, as specified in config file.
numpy 1.23.4 NumPy is the fundamental package for array computing with Python.
scipy 1.9.3 Fundamental algorithms for scientific computing in Python
root#c0a9af7f05b4:/app# poetry run pip --version
Skipping virtualenv creation, as specified in config file.
pip 22.3.1 from /root/.pyenv/versions/3.8.15/lib/python3.8/site-packages/pip (python 3.8)
root#c0a9af7f05b4:/app# pip --version
pip 22.3.1 from /root/.pyenv/versions/3.8.15/lib/python3.8/site-packages/pip (python 3.8)

No module named PyInstaller' after what appears to be a successful install

I am building a docker image. Within it I am trying to install a number of python packages within one RUN. All packages within that command are installed correctly, but PyInstaller is not for some reason, although the build logs make me think that it should have been: Successfully installed PyInstaller
The minimal Dockerfile to reproduce the issue:
FROM debian:buster
RUN apt-get update && \
apt-get install -y \
python3 \
python3-pip \
unixodbc-dev
RUN python3 -m pip install --no-cache-dir pyodbc==4.0.30 && \
python3 -m pip install --no-cache-dir Cython==0.29.19 && \
python3 -m pip install --no-cache-dir PyInstaller==3.5 && \
python3 -m pip install --no-cache-dir selenium==3.141.0 && \
python3 -m pip install --no-cache-dir bs4==0.0.1
RUN python3 -m PyInstaller
The last run command fails with /usr/bin/python3: No module named PyInstaller, all other packages can be imported as expected.
The issue is also reproducible with this Dockerfile:
FROM debian:buster
RUN apt-get update && \
apt-get install -y \
python3 \
python3-pip
RUN python3 -m pip install --no-cache-dir PyInstaller==3.5
RUN python3 -m PyInstaller
What is the reason for this issue and what is the fix?
EDIT:
When I run the layer before the last RUN, I can see that no PyInstaller is installed, but I can run python3 -m pip install --no-cache-dir PyInstaller==3.5 and then it works without changing anything else.
Although I do not fully undestand the reason behind it, it seems like the --no-cache-dir option was causing the issue. The dockerfile below builds without an issue:
FROM debian:buster
RUN apt-get update && \
apt-get install -y \
python3 \
python3-pip
RUN python3 -m pip install PyInstaller==3.5
RUN python3 -m PyInstaller --help
Edit: This seems to be an issue outside of PyInstaller, but with the specific version of pip, see https://github.com/pyinstaller/pyinstaller/issues/6963 for details.
I'm not familiar with PyInstaller but in their requirements page they wrote:
If the pip setup fails to build a bootloader, or if you do not use pip
to install, you must compile a bootloader manually. The process is
described under Building the Bootloader.
Have you try that in your Dockerfile?
(And you're totally right, it should fail... )

Docker image build: How to install python packages google-cloud-bigquery and numpy, scipy and pandas (Miniconda3) for an armv7 architecture?

I'm trying to build an Docker image which should run a python script, which needs numpy, scipy, pandas and google-cloud-bigquery.
Since this image is build for an armv7 architecture it's a pain to install numpy, scipy and pandas directly (it takes too long and finally it breaks). So I decided to use Miniconda and used the packeges for Raspberry Pi. That worked fine (installation can be completet during image build).
Now I'm trying to install the google packages google-crc32c==1.1.2 and google-cloud-bigquery. With pip this is possible and the image is build properly. But if I run a container with this Image it is always restarting and gives me this error log:
File "/usr/src/app/bigquery.py", line 1, in <module>
from google.cloud import bigquery
ImportError: No module named 'google'
I think I have to install the google packages with conda but there are no packages for armv7 architecture available:
google-cloud-bigquery package on Anaconda.org: https://anaconda.org/search?q=google+bigquery
google-crc32c package on Anaconda.org: https://anaconda.org/search?q=google-crc32c
Is there a possibility to install those google packages with Miniconda for armv7 architecture?
Or is another way possible to install numpy, scipy and pandas without using miniconda (but not installing them directly)?
Thank you for any help!
Dockerfile:
FROM python:3.7-buster
WORKDIR /usr/src/app
ENV PATH="/root/miniconda3/bin:${PATH}"
ARG PATH="/root/miniconda3/bin:${PATH}"
COPY main_prog.py bigquery.py requirements.txt ./
RUN wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-armv7l.sh
RUN mkdir /root/.conda
RUN /bin/bash Miniconda3-latest-Linux-armv7l.sh -b
RUN rm -f Miniconda3-latest-Linux-armv7l.sh \
&& echo "Running $(conda --version)"
RUN wget https://github.com/jjhelmus/berryconda/releases/download/v2.0.0/Berryconda3-2.0.0-Linux-armv7l.sh
RUN chmod +x Berryconda3-2.0.0-Linux-armv7l.sh ./Berryconda3-2.0.0-Linux-armv7l.sh
RUN conda list \
&& conda config --add channels rpi \
&& conda install python=3.6 -y\
&& conda install openblas blas -y\
&& conda install numpy -y\
&& conda install pandas -y\
&& conda install scipy -y
RUN pip install --upgrade pip
RUN pip install "google-crc32c==1.1.2"
RUN pip install google-cloud-bigquery
CMD ["python", "main_prog.py"]
I couldn't find I way to install all packages with Miniconda.
But it was possible for me to install them directly with wheels from piwheels.
Therefor I had to add a pip.conf file in "/etc" dirctory.
content of pip.conf:
[global]
extra-index-url=https://www.piwheels.org/simple
In addition I had to install libatlas-base-dev.
I only could do this by adding an URL deb http://ftp.de.debian.org/debian buster main (like it's recommended here) to my sources.list in "/etc/apt/" directory.
content of sources.list:
# deb http://snapshot.debian.org/archive/debian/20210902T000000Z buster main
deb http://deb.debian.org/debian buster main
# deb http://snapshot.debian.org/archive/debian-security/20210902T000000Z buster/updates main
deb http://security.debian.org/debian-security buster/updates main
# deb http://snapshot.debian.org/archive/debian/20210902T000000Z buster-updates main
deb http://deb.debian.org/debian buster-updates main
deb http://ftp.de.debian.org/debian buster main
Dockerfile:
FROM python:3.7-buster
WORKDIR /usr/src/app
COPY main_prog.py bigquery.py requirements.txt pip.conf sources.list ./
RUN mv ./pip.conf /etc \
&& export PIP_CONFIG_FILE=/etc/pip.conf
RUN mv ./sources.list /etc/apt/
RUN apt-get update \
&& apt-get upgrade -y
RUN apt-get install libatlas-base-dev -y
RUN pip3 install --upgrade pip
RUN pip3 install numpy \
&& pip3 install scipy \
&& pip3 install pandas \
&& pip3 install google-crc32c \
&& pip3 install google-cloud-bigquery
CMD ["python", "main_prog.py"]

apache-airflow fails install

I'm trying to install apache-airflow the recommended way with pip install apache-airflow. During the install of pendulum (a dependency), I get an error:
error: can't copy 'pendulum/parsing': doesn't exist or not a regular file
I think it's related to Python distutils error: "[directory]... doesn't exist or not a regular file", but that doesn't give an answer as to how one resolves this when using pip. Pulling the tar for pendulum and installing using python setup.py install works, but then when subsequently I do pip install apache-airflow again, it sees that pendulum is already installed, UNINSTALLS, and then tries to install again using pip, resulting in the same error. I'm using a docker container and installing python-setuptools with apt-get before I do any of this. Here's my dockerfile, fwiw...
FROM phusion/baseimage:0.10.1
MAINTAINER a curious dev
RUN apt-get update && apt-get install -y python-setuptools python-pip python-dev libffi-dev libssl-dev zip wget
ENV SLUGIFY_USES_TEXT_UNIDECODE=yes
RUN wget https://files.pythonhosted.org/packages/5b/57/71fc910edcd937b72aa0ef51c8f5734fbd8c011fa1480fce881433847ec8/pendulum-2.0.4.tar.gz
RUN tar -xzvf pendulum-2.0.4.tar.gz
RUN cd pendulum-2.0.4/ && python setup.py install
RUN pip install apache-airflow
CMD airflow initdb && airflow webserver -p 8080
Does anyone see anything I'm doing wrong? I haven't found anyone else with this error so I think there's something really obvious I'm missing. Thanks for reading.
Upgrade pip first.
FROM phusion/baseimage:0.10.1
RUN apt-get update && apt-get install -y python-setuptools python-pip python-dev libffi-dev libssl-dev zip wget
ENV SLUGIFY_USES_TEXT_UNIDECODE=yes
RUN pip install -U pip
RUN pip install apache-airflow
CMD airflow initdb && airflow webserver -p 8080
seems to work fine for me.

Categories