Install poppler in AWS base python image for Lambda - python

I am trying to deploy my docker container on AWS Lambda. However, I use pdf2image package in my code which depends on poppler. To install poppler, I need to insert the following line in the Dockerfile.
RUN apt-get install -y poppler-utils
This is the full view of the dockerfile.
FROM ubuntu:18.04
RUN apt-get update
RUN apt-get install -y poppler-utils
RUN apt-get install python3 -y
RUN apt-get install python3-pip -y
RUN pip3 install --upgrade pip
WORKDIR /
COPY app.py .
COPY requirements.txt .
RUN pip3 install -r requirements.txt
ENTRYPOINT [ "python3", "app.py" ]
However, to deploy on Lambda, I need to use AWS base python image for Lambda. This is my attempt to rewrite the above dockerfile to use the Lambda base image.
FROM public.ecr.aws/lambda/python:3.6
# Cannot run the follow lines: apt-get: command not found
# RUN apt-get update
# RUN apt-get install -y poppler-utils
COPY app.py .
COPY requirements.txt .
RUN pip install -r requirements.txt
CMD ["app.handler"]
Based on the dockerfile above, you can see that the apt-get command cannot be run. Understandable because it is not from ubuntu image like I did earlier. My question is, how can I install the poppler in the Lambda base image?

It uses the yum package manager, so you can do the following instead:
FROM public.ecr.aws/lambda/python:3.6
RUN yum install -y poppler-utils

Related

Autocomplete of python code does not work with custom docker container for databricks runtime

I've setted up a custom docker container for databricks but for python there's no autocomplete (works for scala).
This is my container:
FROM databricksruntime/standard:9.x
RUN apt-get update && apt-get -y install build-essential \
nvidia-cuda-toolkit \
libpython3.8-dev
ADD requirements.txt .
RUN /databricks/python3/bin/pip3 install --upgrade pip
RUN /databricks/python3/bin/pip3 install -r requirements.txt
What I'm missing?
Thanks in advance!

Docker Won't Install Python

I recently started learning docker and I was attempting to build a flask python image by following a tutorial video.
FROM ubuntu:latest
RUN apt-get update
RUN apt-get install python
CMD echo "Python Installed"
RUN pip install flask
COPY . /opt/source-code
ENTRYPOINT FLASK_APP=/opt/source-code/app.py flask run
this is the Dockerfile in my source code working directory, I run sudo docker build . -t nxte/custom-app on a digitalocean droplet with docker installed but it returns The command '/bin/sh -c apt-get install python' returned a non-zero code: 1.
Any suggestions? I have no idea what the problem is since I followed the tutorial to a T.
You should use -y with apt-get:
RUN apt-get -y install python
Also notice that the above does not install pip and it's not
possible to install it with apt-get -y install python-pip so either
switch to Python 3 and then apt-get -y install python3 and apt-get -y install python3-pip or get pip from other sources.

TesseractNotFoundError: two docker container python app (docker-compose)

I have my python project with tesseract running locally, and it works in Pycharm.
I used docker-compose.yml, having two containers (app and t4re) as follows:
version: '3'
services:
app:
build: .
image: ocr_app:latest
depends_on:
- tesseract
tesseract:
image: tesseractshadow/tesseract4re
container_name: t4re
and my Dockerfile is as follows:
FROM python:3.6.1
# Create app directory
WORKDIR /app
# Bundle app source
COPY venv/src ./src
COPY venv/data ./data
# Install app dependencies
RUN pip install -r src/requirements.txt
CMD python src/ocr.py
and I keep getting these errors:
FileNotFoundError: [Errno 2] No such file or directory: 'tesseract'
pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path
I am new to docker and read tons of documents, but I still cannot manage to fix this error. I've read the following answers. I guess I have to link tesseract to the python app with an environment variable, but I do not know how.
Use Tesseract 4 - Docker Container from uwsgi-nginx-flask-docker
TesseractNotFoundError: tesseract is not installed or it's not in your path
You need to install tesseract in your docker image before using it. By default python:3.6.1 image does not have tesseract in it. You need to take ubuntu base image install tesseract and python in it then continue your work.
Here is the docker file for the solution:
FROM ubuntu:18.04
RUN apt-get --fix-missing update && apt-get --fix-broken install && apt-get install -y poppler-utils && apt-get install -y tesseract-ocr && \
apt-get install -y libtesseract-dev && apt-get install -y libleptonica-dev && ldconfig && apt-get install -y python3.6 && \
apt-get install -y python3-pip && apt install -y libsm6 libxext6
Please adjust the python version as per your requirement.
I had this issue on one of my projects that runs on Docker (a Ubuntu container).
To solve that, I had to:
- install pytesseract via requirements.txt; so it your requirements.txt should contain:
pytesseract
- you have to install tesseract-ocr. To do that, you have to include the following lines in your dockerfile:
FROM ubuntu:18.04
ENV PYTHONUNBUFFERED 1
RUN apt-get update && apt-get install -y software-properties-common && add-apt-repository -y ppa:alex-p/tesseract-ocr
RUN apt-get update && apt-get install -y tesseract-ocr-all
RUN apt-get install -y python3-pip python3-minimal libsm6 libxext6
# To make sure that tesseract-ocr is installed, uncomment the following line.
# RUN tesseract --version

Problem building docker with numpy and pandas over arm64

I'm trying to build a docker image with docker-compose in my ARM64 rasperry pi but it seems to be imposible.
This is my dockerfile:
FROM python:3.6-slim
RUN apt-get update && apt-get -y install python3-dev
RUN apt-get -y install python3-numpy
RUN apt-get -y install python3-pandas
ENTRYPOINT ["python3", "app.py"]
It seems to be OK, but when app.py is run, it gives an error: "Module numpy not found", and the same for pandas module.
If I try to install numpy and pandas using pip:
RUN pip install numpy pandas
It gives me an error or, more usually, the raspberry just gets frozen and I have to unplug it to recover.
I have tried with different versions of python for the source image and also using several ubuntu images and installing python.
Any idea of how can I install numpy and pandas in docker for my raspberry pi (ARM64)?
Thanks
The problems seems to be with the python version. I'm using a python3.6 docker image but, both python3-numpy and python3-pandas packages require python3.5, so when those packages are installed a new version of python is also installed. This is why when I'm trying to import those modules the python interpreter can't found them, because they are installed for another python version.
Finaly I solved it using a generic docker debian image and installing python3.5 myself instead of using a python docker image.
FROM debian:stretch-slim
RUN apt-get update && apt-get -y dist-upgrade
RUN apt-get -y install build-essential libssl-dev libffi-dev python3.5 libblas3 libc6 liblapack3 gcc python3-dev python3-pip cython3
RUN apt-get -y install python3-numpy python3-sklearn
RUN apt-get -y install python3-pandas
COPY requirements.txt /tmp/
RUN pip3 install -r /tmp/requirements.txt
(Disclaimer: The Raspberry Pi 3 B+ is probably too slow to install big dependecies like numpy)
This Dockerfile worked for me on the Raspberry Pi 3 B+ with Software-Version: Linux raspberrypi 5.10.63-v7+ (Consider updating it)
FROM python:3.9-buster
WORKDIR /
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt
I am not sure, but I think it helped also to clean docker i.e. remove all images and containers with the following commands:
Warning: This commands deletes all images and containers!
$ docker container prune
$ docker image prune -a
Or reset Docker completely (deletes also volumes and networks):
$ docker system prune --volumes
I recommend to create requirements.txt file.
Inside you can declare packets to install.
The `Dockerfile':
FROM python
COPY app.py /workdir/
COPY requirements.txt /workdir/
WORKDIR /workdir
RUN pip install --trusted-host pypi.python.org -r requirements.txt
CMD python app.py
edit
I create Dockerfile which import pandas lib and then checking if it work:
cat Dockerfile
FROM python
COPY app.py /workdir/
WORKDIR /workdir
RUN python -m pip install pandas
CMD python app.py

How to check whether python package is installed or not in Docker?

I used Dockerfile successfully built a container. However, my code doesn't work in the container. It does work if I install all the packages manually. I'm assuming I messed up something that cause docker didn't install the packages properly. So, I want to check whether python package is installed or not in Docker container. What is the best way to check it?
The Dockerfile I used:
# Update the sources list
RUN sudo apt-get update
# Install basic applications
RUN sudo apt-get install -y tar git curl nano wget dialog net-tools build-essential
# First install ZeroMQ
RUN sudo apt-get install -y libzmq-dev
# Install libevent
RUN sudo apt-get install -y libevent-dev
# Install Python and Basic Python Tools
RUN sudo apt-get install -y python python-dev python-setuptools
RUN sudo apt-get install -y python-pip
# Add the current directory to the container
ADD . /root/code
# Get pip to download and install requirements:
RUN sudo pip install -r /root/code/requirements.txt
# Expose ports
EXPOSE 80 4242
# Define working directory.
WORKDIR /root/code
# Start the tcp server.
CMD python app.py
The requirements.txt I used:
gevent==1.0.1
greenlet==0.4.5
msgpack-python==0.4.2
pyzmq==13.1.0
wsgiref==0.1.2
zerorpc==0.4.4
I figured out.
docker exec <container ID> pip list

Categories