I have my python project with tesseract running locally, and it works in Pycharm.
I used docker-compose.yml, having two containers (app and t4re) as follows:
version: '3'
services:
app:
build: .
image: ocr_app:latest
depends_on:
- tesseract
tesseract:
image: tesseractshadow/tesseract4re
container_name: t4re
and my Dockerfile is as follows:
FROM python:3.6.1
# Create app directory
WORKDIR /app
# Bundle app source
COPY venv/src ./src
COPY venv/data ./data
# Install app dependencies
RUN pip install -r src/requirements.txt
CMD python src/ocr.py
and I keep getting these errors:
FileNotFoundError: [Errno 2] No such file or directory: 'tesseract'
pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path
I am new to docker and read tons of documents, but I still cannot manage to fix this error. I've read the following answers. I guess I have to link tesseract to the python app with an environment variable, but I do not know how.
Use Tesseract 4 - Docker Container from uwsgi-nginx-flask-docker
TesseractNotFoundError: tesseract is not installed or it's not in your path
You need to install tesseract in your docker image before using it. By default python:3.6.1 image does not have tesseract in it. You need to take ubuntu base image install tesseract and python in it then continue your work.
Here is the docker file for the solution:
FROM ubuntu:18.04
RUN apt-get --fix-missing update && apt-get --fix-broken install && apt-get install -y poppler-utils && apt-get install -y tesseract-ocr && \
apt-get install -y libtesseract-dev && apt-get install -y libleptonica-dev && ldconfig && apt-get install -y python3.6 && \
apt-get install -y python3-pip && apt install -y libsm6 libxext6
Please adjust the python version as per your requirement.
I had this issue on one of my projects that runs on Docker (a Ubuntu container).
To solve that, I had to:
- install pytesseract via requirements.txt; so it your requirements.txt should contain:
pytesseract
- you have to install tesseract-ocr. To do that, you have to include the following lines in your dockerfile:
FROM ubuntu:18.04
ENV PYTHONUNBUFFERED 1
RUN apt-get update && apt-get install -y software-properties-common && add-apt-repository -y ppa:alex-p/tesseract-ocr
RUN apt-get update && apt-get install -y tesseract-ocr-all
RUN apt-get install -y python3-pip python3-minimal libsm6 libxext6
# To make sure that tesseract-ocr is installed, uncomment the following line.
# RUN tesseract --version
Related
I am working on a project that requires me to run pytesseract on a docker container, but am unable to install tesseract onto the container,
I also don't know what the file path for pytesseract should be
My Dockerfile:
FROM python:3
ENV PYHTONUNBUFFERED=1
RUN apt-get update && apt-get install -y --no-install-recommends \
bzip2 \
g++ \
git \
graphviz \
libgl1-mesa-glx \
libhdf5-dev \
openmpi-bin \
wget \
python3-tk && \
rm -rf /var/lib/apt/lists/*
WORKDIR /usr/src/app
COPY requirements.txt ./
RUN pip install -r requirements.txt
ENV QT_X11_NO_MITSHM=1
My pytesseract code:
path_to_tesseract = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
pytesseract.tesseract_cmd = path_to_tesseract
img=cv2.imread(fpath)
img=cv2.resize(img,None,fx=2,fy=2, interpolation=cv2.INTER_CUBIC)
text=pytesseract.image_to_string(img)
I see you are also using opencv. The folowing dependency are required to use pytesseract:
FROM python:3.10-slim
ENV PYHTONUNBUFFERED=1
RUN apt-get update \
&& apt-get -y install tesseract-ocr \ # required for pytesseract
&& apt-get -y install ffmpeg libsm6 libxext6 # required for opencv
...
RUN pip install -r requirements.txt
But as you are using docker I would recommend to install opencv-python-headless instead of opencv which is mainly intended for headless environments like Docker. It will come with a precompiled binary wheel and reduce the docker image size. The Dockerfile will be reduced to:
FROM python:3.10-slim
ENV PYHTONUNBUFFERED=1
RUN apt-get update \
&& apt-get -y install tesseract-ocr
...
RUN pip install -r requirements.txt
Python Program does create folder and put some files over there. But when i try to run the program inside docker via CMD
It creates the folder and put files over there and upon completion, the folder somehow gets removed or doesnt show inside the docker image.
I have tried the following things:
Check Folder Exist after creating - It shows folder created over there.
Check inside the docker image using bash - It doesnt show the folder and contents.
The dockerfile is
FROM ubuntu:18.04
# Upgrade installed packages
RUN apt update
RUN apt upgrade -y
ENV TZ=Europe/London
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
RUN apt-get install -y libreadline-gplv2-dev libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev
WORKDIR /code
RUN apt-get -y install python3-pip
RUN apt-get -y install python3-venv
RUN apt -y install python3-setuptools libffi-dev python3-dev
RUN apt install -y curl
RUN apt install -y unzip
RUN apt-get install -y build-essential swig
WORKDIR /code
RUN python3 -m venv .env
RUN . .env/bin/activate && pip install --upgrade pip && curl https://raw.githubusercontent.com/automl/auto-sklearn/master/requirements.txt | LC_ALL=C.UTF-8 xargs -n 1 -L 1 pip install
COPY requirements.txt requirements.txt
RUN . .env/bin/activate && pip install pyenchant && pip install -r requirements.txt
RUN apt install -y libgl1-mesa-glx
RUN apt-get install -y libglib2.0-0
RUN apt-get install -y libenchant1c2a
RUN mkdir embeddings
COPY . .
RUN curl -L http://nlp.stanford.edu/data/glove.6B.zip --output glove.zip
RUN unzip -o glove.zip -d embeddings/
RUN . .env/bin/activate && python nltk_install.py
CMD . .env/bin/activate && python main.py
Changes to filesystem are not stored in docker image. They exist in container created from an image but if you use 'docker run' command a new container is created.
I made the image from ubuntu:18.04 and install python.
However when I did this in docker-compose
command: python manage.py runserver
it shows the path error.
Maybe I didn't set the path??
but how I set the path for docker user??
ERROR: for django Cannot start service django: OCI runtime create failed: container_linux.go:349: starting container process caused "exec: \"python\": executable file not found in $PATH": unknown
ERROR: Encountered errors while bringing up the project.
FROM ubuntu:18.04
ENV PYTHONUNBUFFERED 1
RUN apt-get -y update
RUN apt-get -y install emacs wget
RUN apt-get -y install apache2-dev mysql-client
RUN apt-get -y install mysql-server libmysqlclient-dev
RUN apt-get install -y software-properties-common
RUN add-apt-repository -y ppa:deadsnakes/ppa
RUN apt-get install -y python3.7
RUN apt-get install -y python-pip
RUN pip install uwsgi django mysqlclient tensorflow_hub django-mysql django-extensions djangorestframework django-filter requests_oauthlib mecab-python3 neologdn gensim janome --no-input
RUN pip install keras tensorflow==1.14.0 --no-cache-dir --no-input
RUN mkdir /code
WORKDIR /code
ADD ./src /code/
You can solve this in two ways (works for me):
in docker-compose add:
command: bash -c 'python manage.py runserver'
or you can add CMD command in your Dockerfile:
CMD: python manage.py runserver
I'm using python3.7-slim-buster docker image for my django project. Now I want to use Geo features of django. But it seems I have to install GDAL. So, I do RUN apt-get install gdal and it raises exception "E: Unable to locate package gdal-bin".
Here is my docker file:
FROM python:3.7-slim-buster
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
# DB vars
ENV DB_USER_NAME ${DB_USER_NAME}
ENV DB_NAME ${DB_NAME}
ENV DB_HOST ${DB_HOST}
ENV DB_PORT ${DB_PORT}
ENV DB_PASSWORD ${DB_PASSWORD}
ENV DJANGO_SECRET_KEY ${DJANGO_SECRET_KEY}
RUN apt-get install -y gdal-bin python-gdal python3-gdal
RUN ["adduser", "${USER_NAME}", "--disabled-password", "--ingroup", "www-data", "--quiet"]
USER ${USER_NAME}
ADD ${PROJECT_NAME}/ /home/${USER_NAME}/${PROJECT_NAME}
WORKDIR /home/${USER_NAME}/${PROJECT_NAME}
ENV PATH="/home/${USER_NAME}/.local/bin:\${PATH}:/usr/local/python3/bin"
RUN pip install --user -r requirements.txt
CMD python manage.py runserver 0.0.0.0:9000
#CMD gunicorn ${PROJECT_NAME}.wsgi:application --bind 0.0.0.0:8000
EXPOSE 8000
you need to do the following:
RUN apt-get update
RUN apt-get install -y software-properties-common && apt-get update
RUN apt-get install -y python3.7-dev
RUN add-apt-repository ppa:ubuntugis/ppa && apt-get update
RUN apt-get install -y gdal-bin libgdal-dev
ARG CPLUS_INCLUDE_PATH=/usr/include/gdal
ARG C_INCLUDE_PATH=/usr/include/gdal
RUN pip install GDAL
If you can use other base image, here is one with gdal installed:
FROM osgeo/gdal:ubuntu-small-3.2.0
That's because your image doesn't have repository which contain gdal-bin package. So you have to add repository (you can see the guideline here) and install it:
RUN add-apt-repository ppa:ubuntugis/ppa && apt-get update && apt-get install -y gdal-bin python-gdal python3-gdal
Docker fails to build from docker file with a symlink error message. Tried changing the images from python 2.7 to ubuntu:latest, but the issue persists. I am running Docker for Mac and stuck at this issue.
My dockerfile:
FROM python:2.7
RUN apt-get update && apt-get install -y tdsodbc unixodbc-dev && apt-get clean -y
ADD odbcinst.ini /etc/odbcinst.ini
ADD requirements.txt /tmp/requirements.txt
RUN pip install -qr /tmp/requirements.txt
ADD . /opt/webapp/
WORKDIR /opt/webapp
CMD python server.py
Docker build command:
docker build --no-cache .
Docker build command output:
Sending build context to Docker daemon 231.2 MB
Step 1/8 : FROM python:2.7
---> ca388cdb5ac1
Step 2/8 : RUN apt-get update && apt-get install -y tdsodbc unixodbc-dev && apt-get clean -y
symlink ../7e0c72321f41b854d5180eac25a4cad71bb8ce4dfed8b6f4b9f0b568a608013a-init/diff /var/lib/docker/overlay2/l/D734HIV6BPIVR3LJZWGYVQZBKD: no such file or directory