I want to install some packages with pip in a container. The trivial way to do this is the following:
FROM ubuntu:trusty
RUN apt-get update && \
apt-get install python-pip <lots-of-dependencies-needed-only-for-pip-install>
RUN pip install <some-packages>
However, this way I install a lot of unneeded dependencies, which increases the size of the container unnecessarily.
My first idea was to do this:
FROM ubuntu:trusty AS pip_install
RUN apt-get update && \
apt-get install python-pip <lots-of-dependencies-needed-only-for-pip-install>
RUN pip install <some-packages>
FROM ubuntu:trusty
RUN apt-get update && \
apt-get install python-pip <runtime-dependencies>
COPY --from=pip_install /usr/local/bin /usr/local/bin
COPY --from=pip_install /usr/local/lib/python2.7 /usr/local/lib/python2.7
This works, but feels like a workaround. Is there any more elegant way of doing this? I thought of something like this:
FROM ubuntu:trusty AS pip_install
RUN apt-get update && \
apt-get install python-pip <lots-of-dependencies-needed-only-for-pip-install>
RUN pip install <some-packages>
VOLUME /usr/local
FROM ubuntu:trusty
<somehow mount /usr/local from pip_install to /tmp/pip>
RUN apt-get update && \
apt-get install python-pip <runtime-dependencies>
RUN pip install <from /tmp/pip> <some-packages>
Is this even possible?
I could have used some of the python images, but in my real application I derive from another image that itself derives from ubuntu:trusty. As for this question, it's beside the point.
Related
I am building a ubuntu docker image that is going to run my python application, and I have some libraries that require python <= 3.6 to work otherwise it will throw errors.
My problem is that when I install pip, it will always automatically use python 3.8, and I'm not sure how to let pip use an older version of python, this is the installation in my Dockerfile
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y software-properties-common && \
add-apt-repository ppa:deadsnakes/ppa && \
apt-add-repository universe && \
apt-get update && \
apt-get install -y \
libmysqlclient-dev \
netcat \
python3 \
python-dev \
build-essential \
python3-setuptools \
python3-pip \
supervisor && \
pip install -U pip setuptools && \
rm -rf /var/lib/apt/lists/*
I tried to change python3-pip by just python-pip but when I run it it gives me the following error
E: Unable to locate package python-pip
I've tried a lot of solutions but always the same problem
Outside of Docker, if python3.6 is the python you need, you can do:
python3.6 -m pip install
In Docker right now obviously python3 is pointing to Python 3.8 so you must first install python3.6 and find out how to call it (python3.6 or python3). You might need to compile it from source and probably create some symbolic link. This can get very ugly to do inside a Docker, but you can try to write a shell script with all commands and to run the shell script inside a Docker. Or if you are lucky you may find a ready Python3.6 Docker package that works for you and apt-get install it instead of python3 the same way as you do now.
Running into an expected issue trying to prepare an ubuntu 20.04 based image with python and pyodbc.
FROM ubuntu:20.04
# install mssql odbc driver
RUN apt-get update && apt-get upgrade -y && apt-get install -y curl gnupg build-essential
RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - \
&& curl https://packages.microsoft.com/config/ubuntu/20.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
RUN apt-get update && ACCEPT_EULA=Y apt-get install -y msodbcsql17 unixodbc-dev
# install python 3.7.9 from source
RUN apt-get install -y python3 python3-pip
# clean up
# this does not work
RUN apt-get remove -y perl curl gnupg && apt-get autoremove -y
# this works
# RUN apt-get remove -y curl gnupg && apt-get autoremove -y
RUN pip3 install pyodbc
If perl is not removed, the installation of pyodbc is uneventful, but if perl is removed, the following error is displayed:
src/pyodbc.h:56:10: fatal error: sql.h: No such file or directory
As if the unixodbc-dev is also removed for some reason. Has anyone run into this before? If perl is required, wouldn't apt-get prevent it from being deleted? Or I need to install a different set of c-bindings to make this work.
Also running apt-get install -f after installing msodbcsql17 doesn't help either.
Thanks.
unixodbc-dev was installed as a transitive dependency and was automatically removed when no longer needed, i.e. after perl was removed. You need to install it explicitly:
RUN apt-get install -y unixodbc-dev
See the following bug report for details: https://github.com/mkleehammer/pyodbc/issues/441
I have a Python based machine learning project that I want to dockerize. I have several heavy dependencies, like, dlib, face_recognition, tensorflow, OpenCV etc.
Following is my docker file
FROM ubuntu:18.04
WORKDIR /app
RUN apt update \
&& apt -y upgrade \
&& apt install -y python3 \
&& apt install -y python3-pip \
&& apt install -y poppler-utils \
&& apt install -y libsm6 libxext6 libxrender-dev
ARG DEBIAN_FRONTEND=noninteractive
RUN apt install -y postgresql
COPY dlib-19.17.0-cp36-cp36m-linux_x86_64.whl /app/dlib-19.17.0-cp36-cp36m-linux_x86_64.whl
COPY requirements.txt /app/requirements.txt
RUN pip3 install dlib-19.17.0-cp36-cp36m-linux_x86_64.whl \
&& pip3 install -r requirements.txt
COPY . /app
CMD gunicorn -t 300 --workers 5 --bind 0.0.0.0:8080 wsgi
After building the image, it turns out to be 2.5 GB. Is it ok to have an image this big? If no, how can I reduce the size while maintaining the dependencies.
Whether the size is a problem actually depends on whether you just want to have a container, or you want to distribute and scale it. In the latter case, size is a problem because it slows the process.
What you need to do to reduce the size is to use multi-stage. Here you have a sample of my own https://github.com/eez0/docker-samples/blob/master/Dockerfile_python.
The bottom line is to differentiate between the building and the running process. For example, currently you install all the dependencies, but some might be only used at building time, so it's safe to remove them.
If for some reason you don't want to get into multi-stage, then use --no-install-recommends to install only what's necessary, and remove all the unnecessary build dependencies and apt cache at the end. Also, try to use a smaller base image, for example python:3.7-slim
I'm trying to install apache-airflow the recommended way with pip install apache-airflow. During the install of pendulum (a dependency), I get an error:
error: can't copy 'pendulum/parsing': doesn't exist or not a regular file
I think it's related to Python distutils error: "[directory]... doesn't exist or not a regular file", but that doesn't give an answer as to how one resolves this when using pip. Pulling the tar for pendulum and installing using python setup.py install works, but then when subsequently I do pip install apache-airflow again, it sees that pendulum is already installed, UNINSTALLS, and then tries to install again using pip, resulting in the same error. I'm using a docker container and installing python-setuptools with apt-get before I do any of this. Here's my dockerfile, fwiw...
FROM phusion/baseimage:0.10.1
MAINTAINER a curious dev
RUN apt-get update && apt-get install -y python-setuptools python-pip python-dev libffi-dev libssl-dev zip wget
ENV SLUGIFY_USES_TEXT_UNIDECODE=yes
RUN wget https://files.pythonhosted.org/packages/5b/57/71fc910edcd937b72aa0ef51c8f5734fbd8c011fa1480fce881433847ec8/pendulum-2.0.4.tar.gz
RUN tar -xzvf pendulum-2.0.4.tar.gz
RUN cd pendulum-2.0.4/ && python setup.py install
RUN pip install apache-airflow
CMD airflow initdb && airflow webserver -p 8080
Does anyone see anything I'm doing wrong? I haven't found anyone else with this error so I think there's something really obvious I'm missing. Thanks for reading.
Upgrade pip first.
FROM phusion/baseimage:0.10.1
RUN apt-get update && apt-get install -y python-setuptools python-pip python-dev libffi-dev libssl-dev zip wget
ENV SLUGIFY_USES_TEXT_UNIDECODE=yes
RUN pip install -U pip
RUN pip install apache-airflow
CMD airflow initdb && airflow webserver -p 8080
seems to work fine for me.
I'm trying to install awscli using pip (as per Amazon's recommendations) in a custom Docker image that comes FROM library/node:6.11.2. Here's a repro:
FROM library/node:6.11.2
RUN apt-get update && \
apt-get install -y \
python \
python-pip \
python-setuptools \
groff \
less \
&& pip --no-cache-dir install --upgrade awscli \
&& apt-get clean
CMD ["/bin/bash"]
However, with the above I'm met with:
no such option: --no-cache-dir
Presumably because I've got incorrect versions of Python and/or Pip?
I'm installing Python, Pip, and awscli in a similar way with FROM maven:3.5.0-jdk-8 and there it works just fine. I'm unsure what the relevant differences between the two images are.
Removing said option from my Dockerfile doesn't do me much good either, because then I'm met with a big pile of different errors, an excerpt here:
Installing collected packages: awscli, PyYAML, docutils, rsa, colorama, botocore, s3transfer, pyasn1, jmespath, python-dateutil, futures, six
Running setup.py install for PyYAML
checking if libyaml is compilable
### ABBREVIATED ###
ext/_yaml.c:4:20: fatal error: Python.h: No such file or directory
#include "Python.h"
^
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
### ABBREVIATED ###
Bottom line: how do you properly install awscli in library/node:6.x based images?
Adding python-dev as per this other answer works, but throws an alarming number of compiler warnings (errors?), so I went with a variation of #SergeyKoralev's answer, which needed some tweaking before it worked.
Here's the changes I needed to make this work:
Change to python3 and pip3 everywhere.
Add a statement to upgrade pip itself.
Separate the awscli install in a separate RUN command.
Here's a full repro that does seem to work:
FROM library/node:6.11.2
RUN apt-get update && \
apt-get install -y \
python3 \
python3-pip \
python3-setuptools \
groff \
less \
&& pip3 install --upgrade pip \
&& apt-get clean
RUN pip3 --no-cache-dir install --upgrade awscli
CMD ["/bin/bash"]
You can probably also keep the aws install in the same RUN layer if you add a shell command before the install that refreshes things after upgrading pip. Not sure how though.
All the answers are about aws-cli version 1, If you want version 2 try the below
FROM node:lts-stretch-slim
RUN apt-get update && \
apt-get install -y \
unzip \
curl \
&& apt-get clean \
&& curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" \
&& unzip awscliv2.zip \
&& ./aws/install \
&& rm -rf \
awscliv2.zip \
&& apt-get -y purge curl \
&& apt-get -y purge unzip
CMD ["/bin/bash"]
As you have correctly stated, pip installing on the docker image you are using is an older one not supporting --no-cache-dir. You can try updating that or you can also fix the second problem which is about missing python source headers. This can be fixed by installing python-dev package. Just add that to the list of packages installed in the Dockerfile:
FROM library/node:6.11.2
RUN apt-get update && \
apt-get install -y \
python \
python-dev \
python-pip \
python-setuptools \
groff \
less \
&& pip install --upgrade awscli \
&& apt-get clean
CMD ["/bin/bash"]
You can then run aws which should be on your path.
Your image is based on Debian Jessie, so you are installing Python 2.7. Try using Python 3.x:
apt-get install -y python3-pip
pip3 install awscli
Install AWS CLI in docker container using below command:
apt upgrade -y;apt update;apt install python3 python3-pip python3-setuptools -y; python3 -m pip --no-cache-dir install --upgrade awscli
To check the assumed role or AWS identity run below command:
aws sts get-caller-identity