Python scrapy+selenium scraper docker build error (error code 100) - python

I'm trying to deploy a python scraper to docker which requires selenium geckodriver and firefox.
I'm encountering a docker error 100 while trying to build a docker image.
Here's the docker code
FROM scrapinghub/scrapinghub-stack-scrapy:1.3-py3
#RUN apt-get install -y apt-transport-https unzip
RUN apt-get install unzip
RUN printf "deb http://archive.debian.org/debian/ jessie main\ndeb-src http://archive.debian.org/debian/ jessie main\ndeb http://security.debian.org jessie/updates main\ndeb-src http://security.debian.org jessie/updates main" > /etc/apt/sources.list
#============================================
# Firefox and Geckodriver
#============================================
RUN apt-get update \
&& apt-get install -y --no-install-recommends ca-certificates curl firefox-esr \
&& rm -fr /var/lib/apt/lists/* \
&& curl -L https://github.com/mozilla/geckodriver/releases/download/v0.24.0/geckodriver-v0.24.0-linux64.tar.gz | tar xz -C /usr/local/bin \
&& apt-get purge -y ca-certificates curl
ENV TERM xterm
ENV SCRAPY_SETTINGS_MODULE <my_project_name>.settings
RUN mkdir -p /app
WORKDIR /app
COPY ./requirements.txt /app/requirements.txt
RUN pip install -r requirements.txt
COPY . /app
RUN python setup.py install
Here's the log that eventually comes up on my terminal
Error The command '/bin/sh -c apt-get update && apt-get install -y --no-install-recommends ca-certificates curl firefox-esr && rm -fr /var/lib/apt/lists/*
&& curl -L https://github.com/mozilla/geckodriver/releases/download/v0.24.0/geckodriver-v0.24.0-linux64.tar.gz | tar xz -C /usr/local/bin && apt-get purge -y ca-certificates curl'
returned a non-zero code: 100:
{'code': 100, 'message': "The command '/bin/sh -c apt-get update && apt-get install -y --no-install-recommends ca-certificates curl firefox-esr && rm -fr /var/lib/apt/lists/*
&& curl -L https://github.com/mozilla/geckodriver/releases/download/v0.24.0/geckodriver-v0.24.0-linux64.tar.gz | tar xz -C /usr/local/bin
&& apt-get purge -y ca-certificates curl' returned a non-zero code: 100"}

I had the same issue, when running docker build . directly it gave me the following log line pointing to the issue:
#9 23.59 W: GPG error: http://archive.debian.org jessie Release: The following signatures were invalid: KEYEXPIRED 1587841717
If I remove the line:
RUN printf "deb http://archive.debian.org/debian/ jessie main\ndeb-src http://archive.debian.org/debian/ jessie main\ndeb http://security.debian.org jessie/updates main\ndeb-src http://security.debian.org jessie/updates main" > /etc/apt/sources.list
Then it appears to be behave.

Related

How do i update pyjwt from 2.3.0 to 2.4.0 in Dockerfile ubuntu 22.04?

We found 1 vulnerability in base docker image "pyjwt version 2.3.0 has 1 vulnerability" Fixed in version pyjwt 2.4.0
Below is the Dockerfile
FROM ubuntu:22.04
# hadolint ignore=DL3015
# hadolint ignore=DL3008
RUN apt-get clean
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get -y update
RUN apt-get -y upgrade apt \
&& apt-get install -y unoconv ghostscript software-properties-common \
&& add-apt-repository ppa:ondrej/php -y \
&& apt -y install php7.4 \
&& apt-get install -y curl jq php7.4-bcmath php7.4-xml zip unzip php7.4-zip \
&& apt-get install -y php7.4-fpm php7.4-amqp composer nginx openssl php7.4-curl ca-certificates \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" \
&& unzip awscliv2.zip \
&& ./aws/install \
&& rm awscliv2.zip
# Setup services
COPY ./src/scripts/nginx.conf /etc/nginx/nginx.conf
COPY ./src/scripts/run.sh /opt/run.sh
RUN chmod -R a+rw /etc/nginx
RUN chmod -R a+rw /etc/php/7.4/fpm
RUN chmod +x /opt/run.sh
EXPOSE 8080 8443
CMD [ "/opt/run.sh" ]
I have tried many things like update installing python3 and updating pyjwt package with pip install pyjwt==2.4.0. But it didn't work. It seems like one of the above package from Dockerfile is using pyjwt(2.3.0) and I don't know how do i update it.
You can try uninstall python3-jwt package with apt and install new version with pip
RUN apt purge --autoremove python3-jwt -y
RUN pip3 install PyJWT==2.4.0

Why is pdftocairo not getting installed when installing poppler?

I am using the following command in the dockerfile to install poppler
RUN wget --directory-prefix=~ poppler.freedesktop.org/poppler-22.04.0.tar.xz -O /tmp/poppler-22.04.0.tar.xz && apt-get install xz-utils && tar xf /tmp/poppler-22.04.0.tar.xz -C /tmp && cd /tmp/poppler-22.04.0 && ls && mkdir build && cd build && apt-get -y install libfreetype6-dev && apt-get -y install pkg-config && apt-get -y install libfontconfig1-dev && apt-get -y install libjpeg-dev && apt-get -y install libopenjp2-7-dev && apt-get -y install build-essential cmake && cmake -DENABLE_BOOST=OFF .. && make && make install
When I exec inside the container and do
pdftocairo -v
it says command not found. However when I do
pdftotext -v
I get the following output
pdftotext version 22.04.0
Copyright 2005-2022 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011, 2022 Glyph & Cog, LLC
Was expecting something similar for pdftocairo also as I need the library to convert pdf files to jpeg images. Can someone explain why this is happening ?

How to install Python2.7.5 in Ubuntu docker image?

I have specific requirement to install Python 2.7.5 in Ubuntu, I could install 2.7.18 without any issues
Below is my dockerfile
ARG UBUNTU_VERSION=18.04
FROM ubuntu:$UBUNTU_VERSION
RUN apt-get update -y \
&& apt-get install -y python2.7.x \
&& rm -rf /var/lib/apt/lists/*
ENTRYPOINT ["python"]
however if I set it to python2.7.5
ARG UBUNTU_VERSION=18.04
FROM ubuntu:$UBUNTU_VERSION
RUN apt-get update -y \
&& apt-get install -y python2.7.5 \
&& rm -rf /var/lib/apt/lists/*
ENTRYPOINT ["python"]
it is throwing the following error
E: Couldn't find any package by regex 'python2.7.5'
I want to install Python 2.7.5 along with relevant PIP, what should I do?
This version is no longer available in canonical mirrors.
It has been released in 2013.
As a result, having both python and pip working together since then is challenging.
Python 2.7.5 + PIP on centos7
It may be the simplest way if ubuntu is not a requirement.
ARG CENTOS_VERSION=7
FROM centos:$CENTOS_VERSION
# Python 2.7.5 is installed with centos7 image
# Add repository for PIP
RUN yum install -y epel-release
# Install pip
RUN yum install -y python-pip
RUN python --version
ENTRYPOINT [ "python" ]
Python 2.7.5 on ubuntu
I've been able to install it from source
It has not been a success to install pip :
https://bootstrap.pypa.io/pip/2.7/get-pip.py
ARG UBUNTU_VERSION=18.04
FROM ubuntu:$UBUNTU_VERSION
ARG PYTHON_VERSION=2.7.5
# Install dependencies
# PIP - openssl version > 1.1 may be an issue (try older ubuntu images)
RUN apt-get update \
&& apt-get install -y wget gcc make openssl libffi-dev libgdbm-dev libsqlite3-dev libssl-dev zlib1g-dev \
&& apt-get clean
WORKDIR /tmp/
# Build Python from source
RUN wget https://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tgz \
&& tar --extract -f Python-$PYTHON_VERSION.tgz \
&& cd ./Python-$PYTHON_VERSION/ \
&& ./configure --enable-optimizations --prefix=/usr/local \
&& make && make install \
&& cd ../ \
&& rm -r ./Python-$PYTHON_VERSION*
RUN python --version
ENTRYPOINT [ "python" ]
Python 2.7.6 + pip on ubuntu
Ubuntu 14.04 still has mirrors working (how long ???).
Python packages are really close to your expectations.
You may try to run your scripts with that one.
ARG UBUNTU_VERSION=14.04
FROM ubuntu:$UBUNTU_VERSION
RUN apt-get update \
&& apt-get install -y python python-pip \
&& apt-get clean
RUN python --version
ENTRYPOINT [ "python" ]
Python 2.7.5 + pip, not working but could on ubuntu
Here is what I've tried with no success.
ARG UBUNTU_VERSION=16.04
FROM ubuntu:$UBUNTU_VERSION
# Install dependencies
RUN apt-get update \
&& apt-get install -y wget gcc make openssl libffi-dev libgdbm-dev libsqlite3-dev libssl-dev zlib1g-dev \
&& apt-get clean
WORKDIR /tmp/
# Build python from source
RUN wget https://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tgz \
&& tar --extract -f Python-$PYTHON_VERSION.tgz \
&& cd ./Python-$PYTHON_VERSION/ \
&& ./configure --enable-optimizations --prefix=/usr/local \
&& make && make install \
&& cd ../ \
&& rm -r ./Python-$PYTHON_VERSION*
# Build pip from source
RUN wget https://bootstrap.pypa.io/pip/2.7/get-pip.py \
&& python get-pip.py
RUN python --version
ENTRYPOINT [ "python" ]
Python 2.7.9 with pip - as requested in comment
You can use this dockerfile, building python includes pip.
ARG UBUNTU_VERSION=16.04
FROM ubuntu:$UBUNTU_VERSION
ARG PYTHON_VERSION=2.7.9
# Install dependencies
RUN apt-get update \
&& apt-get install -y wget gcc make openssl libffi-dev libgdbm-dev libsqlite3-dev libssl-dev zlib1g-dev \
&& apt-get clean
WORKDIR /tmp/
# Build Python from source
RUN wget https://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tgz \
&& tar --extract -f Python-$PYTHON_VERSION.tgz \
&& cd ./Python-$PYTHON_VERSION/ \
&& ./configure --with-ensurepip=install --enable-optimizations --prefix=/usr/local \
&& make && make install \
&& cd ../ \
&& rm -r ./Python-$PYTHON_VERSION*
RUN python --version \
&& pip --version
ENTRYPOINT [ "python" ]
The simplest possible solution:
sudo apt-get install libssl-dev openssl
wget https://www.python.org/ftp/python/2.7.5/Python-2.7.5.tgz
tar xzvf Python-2.7.5.tgz
cd Python-2.7.5
./configure
make
sudo make install
After installation completed set installed python as default one.

cant open .env/activate in docker compose but work in docker build

Following is my docker file
FROM ubuntu:18.04
# Upgrade installed packages
RUN apt update
RUN apt upgrade -y
ENV TZ=Europe/London
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
RUN apt-get install -y libreadline-gplv2-dev libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev
WORKDIR /code
RUN apt-get -y install python3-pip
RUN apt-get -y install python3-venv
RUN apt -y install python3-setuptools libffi-dev python3-dev
RUN apt install -y curl
RUN apt install -y unzip
RUN apt-get install -y build-essential swig
WORKDIR /code
RUN python3 -m venv .env
RUN . .env/bin/activate && pip install --upgrade pip && curl https://raw.githubusercontent.com/automl/auto-sklearn/master/requirements.txt | LC_ALL=C.UTF-8 xargs -n 1 -L 1 pip install
COPY requirements.txt requirements.txt
RUN . .env/bin/activate && pip install pyenchant && pip install -r requirements.txt
RUN apt install -y libgl1-mesa-glx
RUN apt-get install -y libglib2.0-0
RUN apt-get install -y libenchant1c2a
RUN mkdir embeddings
COPY . .
RUN curl -L http://nlp.stanford.edu/data/glove.6B.zip --output glove.zip
RUN unzip -o glove.zip -d embeddings/
RUN . .env/bin/activate && python nltk_install.py
CMD . .env/bin/activate && python main.py
It works when built using docker build .
Docker-compose file
version: "3.9" # optional since v1.27.0
services:
pythonscripts:
build: .
volumes:
- "/mnt/d/code/data1/:/code/data/"
But when i try with docker-compose using docker-compose up. it gives following error:
Successfully built 06caf1786bfe
Successfully tagged openbotsdocumentsautomlpythonscripts_pythonscripts:latest
WARNING: Image for service pythonscripts was built because it did not already exist. To rebuild this image you must use `docker-compose build` or `docker-compose up --build`.
Creating openbotsdocumentsautomlpythonscripts_pythonscripts_1 ... done
Attaching to openbotsdocumentsautomlpythonscripts_pythonscripts_1
pythonscripts_1 | /bin/sh: 1: .: Can't open .env/bin/activate
openbotsdocumentsautomlpythonscripts_pythonscripts_1 exited with code 2

Changing a Dockerfile from CUDA9 to CUDA10

I need to change the Dockerfile below, coming from pytorch_geometric (a popular PyTorch package), from CUDA9.0 to CUDA10.0.
FROM ubuntu:16.04
RUN apt-get update && apt-get install -y --no-install-recommends apt-utils ca-certificates apt-transport-https gnupg-curl && \
rm -rf /var/lib/apt/lists/* && \
NVIDIA_GPGKEY_SUM=d1be581509378368edeec8c1eb2958702feedf3bc3d17011adbf24efacce4ab5 && \
NVIDIA_GPGKEY_FPR=ae09fe4bbd223a84b2ccfce3f60f4b3d7fa2af80 && \
apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub && \
apt-key adv --export --no-emit-version -a $NVIDIA_GPGKEY_FPR | tail -n +5 > cudasign.pub && \
echo "$NVIDIA_GPGKEY_SUM cudasign.pub" | sha256sum -c --strict - && rm cudasign.pub && \
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 /" > /etc/apt/sources.list.d/cuda.list && \
echo "deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list
ENV CUDA_VERSION 9.0.176
ENV NCCL_VERSION 2.4.2
ENV CUDA_PKG_VERSION 9-0=$CUDA_VERSION-1
ENV CUDNN_VERSION 7.4.2.24
RUN apt-get update && apt-get install -y --no-install-recommends \
cuda-cudart-$CUDA_PKG_VERSION && \
ln -s cuda-9.0 /usr/local/cuda && \
rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install -y --allow-unauthenticated --no-install-recommends \
cuda-libraries-$CUDA_PKG_VERSION \
libnccl2=$NCCL_VERSION-1+cuda9.0 && \
apt-mark hold libnccl2 && \
rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install -y --allow-unauthenticated --no-install-recommends \
cuda-libraries-dev-$CUDA_PKG_VERSION \
cuda-nvml-dev-$CUDA_PKG_VERSION \
cuda-minimal-build-$CUDA_PKG_VERSION \
cuda-command-line-tools-$CUDA_PKG_VERSION \
cuda-core-9-0=9.0.176.3-1 \
cuda-cublas-dev-9-0=9.0.176.4-1 \
libnccl-dev=$NCCL_VERSION-1+cuda9.0 && \
rm -rf /var/lib/apt/lists/*
ENV LIBRARY_PATH /usr/local/cuda/lib64/stubs
# NVIDIA docker 1.0.
LABEL com.nvidia.volumes.needed="nvidia_driver"
LABEL com.nvidia.cuda.version="${CUDA_VERSION}"
RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf && \
echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf
ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64
# NVIDIA container runtime.
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
ENV NVIDIA_REQUIRE_CUDA "cuda>=9.0"
# PyTorch (Geometric) installation
RUN rm /etc/apt/sources.list.d/cuda.list && \
rm /etc/apt/sources.list.d/nvidia-ml.list
RUN apt-get update && apt-get install -y \
curl \
ca-certificates \
vim \
sudo \
git \
bzip2 \
libx11-6 \
&& rm -rf /var/lib/apt/lists/*
# Create a working directory.
RUN mkdir /app
WORKDIR /app
# Create a non-root user and switch to it.
RUN adduser --disabled-password --gecos '' --shell /bin/bash user \
&& chown -R user:user /app
RUN echo "user ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/90-user
USER user
# All users can use /home/user as their home directory.
ENV HOME=/home/user
RUN chmod 777 /home/user
# Install Miniconda.
RUN curl -so ~/miniconda.sh https://repo.continuum.io/miniconda/Miniconda3-4.5.12-Linux-x86_64.sh \
&& chmod +x ~/miniconda.sh \
&& ~/miniconda.sh -b -p ~/miniconda \
&& rm ~/miniconda.sh
ENV PATH=/home/user/miniconda/bin:$PATH
ENV CONDA_AUTO_UPDATE_CONDA=false
# Create a Python 3.6 environment.
RUN /home/user/miniconda/bin/conda install conda-build \
&& /home/user/miniconda/bin/conda create -y --name py36 python=3.6.5 \
&& /home/user/miniconda/bin/conda clean -ya
ENV CONDA_DEFAULT_ENV=py36
ENV CONDA_PREFIX=/home/user/miniconda/envs/$CONDA_DEFAULT_ENV
ENV PATH=$CONDA_PREFIX/bin:$PATH
# CUDA 9.0-specific steps.
RUN conda install -y -c pytorch \
cuda90=1.0 \
magma-cuda90=2.4.0 \
"pytorch=1.1.0=py3.6_cuda9.0.176_cudnn7.5.1_0" \
torchvision=0.2.1 \
&& conda clean -ya
# Install HDF5 Python bindings.
RUN conda install -y h5py=2.8.0 \
&& conda clean -ya
RUN pip install h5py-cache==1.0
# Install TorchNet, a high-level framework for PyTorch.
RUN pip install torchnet==0.0.4
# Install Requests, a Python library for making HTTP requests.
RUN conda install -y requests=2.19.1 \
&& conda clean -ya
# Install Graphviz.
RUN conda install -y graphviz=2.38.0 \
&& conda clean -ya
RUN pip install graphviz==0.8.4
# Install OpenCV3 Python bindings.
RUN sudo apt-get update && sudo apt-get install -y --no-install-recommends \
libgtk2.0-0 \
libcanberra-gtk-module \
&& sudo rm -rf /var/lib/apt/lists/*
RUN conda install -y -c menpo opencv3=3.1.0 \
&& conda clean -ya
# Install PyTorch Geometric.
RUN CPATH=/usr/local/cuda/include:$CPATH \
&& LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH \
&& DYLD_LIBRARY_PATH=/usr/local/cuda/lib:$DYLD_LIBRARY_PATH
RUN pip install --verbose --no-cache-dir torch-scatter \
&& pip install --verbose --no-cache-dir torch-sparse \
&& pip install --verbose --no-cache-dir torch-cluster \
&& pip install --verbose --no-cache-dir torch-spline-conv \
&& pip install torch-geometric
# Set the default command to python3.
CMD ["python3"]
I've tried starting it with FROM pytorch/pytorch:1.1.0-cuda10.0-cudnn7.5-runtime and commenting everything up to # PyTorch (Geometric) installation and the section on # CUDA 9.0-specific steps. for
RUN conda install -c pytorch pytorch
RUN conda install -c fragcolor cuda10.0 && conda clean -ya
and commenting out
# Install Graphviz.
RUN conda install -y graphviz=2.38.0 \
&& conda clean -ya
RUN pip install graphviz==0.8.4
which didn't seem to work even with CUDA9.0
This makes the docker work and load, pytorch to be able to be imported and cuda to work as well. However, when I try to import torch_geometric I get ModuleNotFoundError: No module named 'torch_scatter.scatter_cuda'
Since the package is well-maintained (4.5k stars, mentioned in the pytorch website), it seems to me it's likely my fault and something general about how to adapt from CUDA9.0 to CUDA10.0.
I'd appreciate any advice on what I could be doing wrong or a way of changing it without removing so many lines from the original Dockerfile, which is probably what's causing the issue.
Have you ever try with nvidia/cuda Docker base image?
try
FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04
RUN apt-get update && apt-get install -y --no-install-recommends \
apt-utils \
python3.6 \
python-dev \
python-pip \
python-setuptools \
&& \
rm -rf /var/lib/apt/lists/* && \
apt-get update
RUN pip install --upgrade pip==9.0.3 && \
pip --no-cache-dir install --upgrade torch==1.1.0 && \
pip --no-cache-dir install --upgrade torchvision==0.3.0
The versions of the packages I wrote are stable to use with Dockerfile.
I checked it and it worked well without any collision.

Categories