Pass windows environmental variables to dockerized python app - python

I am running a python application that reads two paths from Windows env vars and proceeds to use the executables in those paths to do OCR on some documents. Since POPPLER, TESSERACT env vars are already set in Windows, this Python snippet works for me:
popplerPath = os.environ.get('POPPLER')
tesseractPath = os.environ.get('TESSERACT')
Now I am trying to dockerize the app, and, to my understanding, since my container will need access to those paths, I need to mount them using VOLUME during run. My dockerfile looks like this:
FROM python:3.7.7-slim
WORKDIR ./
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY documents/ .
COPY src/ ./src
CMD [ "python", "./src/run.py" ]
I build the image using:
docker build -t ocr .
And I try to run my container using:
docker run -v %POPPLER%:%POPPLER% -v %TESSERACT%:%TESSERACT% ocr
... but my app still gets a None value for these paths and can't use the executable files. Is my approach correct and beyond that, is it a good dev practice?

See the doc, the switch for environment variable is -e:
$ docker run -e MYVAR1 --env MYVAR2=foo --env-file ./env.list ubuntu bash
and in dockerfile, you can use
ENV FOO=/bar
If I understand your statement correctly, your paths are mounted in the container in the same path as the host. The only problem is your Python script, which expects the paths to be provided by the environment variable. This will not exist unless you pass on them from your host system to your container system.
Once you verified your mounted volume with -v is there correctly, you can try with
docker run -v %POPPLER%:%POPPLER% -v %TESSERACT%:%TESSERACT% --env POPPLER=%POPPLER% --env TESSERACT=%TESSERACT% ocr
or, if you always run this, you can consider to put them in your dockerfile to save some keystroke.

Any executable you call must be built into the image. Containers can't usually call executables on the host or in other containers. In the specific example you show, a Linux container can't run a Windows executable, even if you do use a bind mount to inject it into the container.
The "slim" python images are built on Debian GNU/Linux, and you need to use its APT tool to install these executable dependencies in your Dockerfile. (https://www.debian.org/distrib/packages has a search box to help you find the right package name; Ubuntu Linux also uses Debian packages.)
FROM python:3.7-slim
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive \
apt-get install -y \
popper-utils \
tesseract-ocr-all
COPY requirements.txt .
...
I'd suggest putting reasonable defaults in your code if these environment variables aren't set. The apt-get install command will put them in the system path inside the image.
popplerPath = os.environ.get('POPPLER', 'poppler')
tesseractPath = os.environ.get('TESSERACT', 'tesseract')
If you really need them as environment variables you could use the Dockerfile ENV directive
ENV POPPLER=poppler TESSERACT=tesseract
Environment variables from the host don't automatically get passed through to the container; you need a Dockerfile ENV or docker run -e option. Also remember that the container has an isolated filesystem (and Windows-syntax paths don't make sense in Linux containers) so these environment variables would need to be container paths, the second half of your proposed docker run -v option.

Related

Docker run does not produce any endpoint

I am trying to dockerize this repo. After building it like so:
docker build -t layoutlm-v2 .
I try to run it like so:
docker run -d -p 5001:5000 layoutlm-v2
It downloads the necessary libraries and packages:
And then nothing... No errors, no endpoints generated, just radio silence.
What's wrong? And how do I fix it?
You appear to be expecting your application to offer a service on port 5000, but it doesn't appear as if that's how your code behaves.
Looking at your code, you seem to be launching a service using gradio. According the quickstart, calling gr.Interface(...).launch() will launch a service on localhost:7860, and indeed, if you inspect a container booted from your image, we see:
root#74cf8b2463ab:/app# ss -tln
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 2048 127.0.0.1:7860 0.0.0.0:*
There's no way to access a service listening on localhost from outside the container, so we need to figure out how to fix that.
Looking at these docs, it looks like you can control the listen address using the server_name parameter:
server_name
to make app accessible on local network, set this to "0.0.0.0". Can be set by environment variable GRADIO_SERVER_NAME. If None, will use "127.0.0.1".
So if we run your image like this:
docker run -p 7860:7860 -e GRADIO_SERVER_NAME=0.0.0.0 layoutlm-v2
Then we should be able to access the interface on the host at http://localhost:7860/...and indeed, that seems to work:
Unrelated to your question:
You're setting up a virtual environment in your Dockerfile, but you're not using it, primarily because of a typo here:
ENV PATH="VIRTUAL_ENV/bin:$PATH"
You're missing a $ on $VIRTUAL_ENV.
You could optimize the order of operations in your Dockerfile. Right now, making a simple change to your Dockerfile (e.g, editing the CMD setting) will cause much of your image to be rebuilt. You could avoid that by restructuring the Dockerfile like this:
FROM python:3.9
# Install dependencies
RUN apt-get update && apt-get install -y tesseract-ocr
RUN pip install virtualenv && virtualenv venv -p python3
ENV VIRTUAL_ENV=/venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt
RUN git clone https://github.com/facebookresearch/detectron2.git
RUN python -m pip install -e detectron2
COPY . /app
# Run the application:
CMD ["python", "-u", "app.py"]

python script that runs inside a docker image, doesn't use the usual PYTHONPATH

I'm creating a docker image using the following Dockerfile:
FROM python:3.7
RUN apt-get update && pip install sagemaker boto3 numpy sagemaker-training
# Copies the training code inside the container
COPY cv.py /opt/ml/code/train.py
COPY scikit_learn_iris.py /opt/ml/code/scikit_learn_iris.py
# Defines train.py as script entrypoint
ENV SAGEMAKER_PROGRAM train.py
# Install custom packages specified in requirements.txts
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
ENV PYTHONPATH "/usr/local/lib/python3.7/site-packages"
In the requirements file, I have added lightgbm library and it installs it successfully inside the docker image. When sagemaker runs starts to run scikit_learn_iris.py cause it can't import lightgbm: ModuleNotFoundError: No module named 'lightgbm'. I'm printing the sys path and PYTHONPATH at the start of scikit_learn_iris.py script and it shows the following results :
sys.path = ['/opt/ml/code', '/opt/ml/code', '/miniconda3/bin', '/miniconda3/lib/python37.zip', '/miniconda3/lib/python3.7', '/miniconda3/lib/python3.7/lib-dynload', '/miniconda3/lib/python3.7/site-packages']
PYTHONPATH = ['/opt/ml/code', '/miniconda3/bin', '/miniconda3/lib/python37.zip', '/miniconda3/lib/python3.7', '/miniconda3/lib/python3.7/lib-dynload', '/miniconda3/lib/python3.7/site-packages']
why the script is using /miniconda3/... to find the libraries? Even tough I'm setting PYTHONPATH env variable in the Dockerfile? How do I make it understand to look in the correct path?! This path /miniconda3/ doesn't even exists in the docker image when I checked (using docker run -it IMAGE_NAME bash)
I don't see any base image in the Dockerfile, it is possible that came from a base image if there was any. And I don't know if this helps, but this is the ```prinenv`` output from my training image. And I didn't have to set PYTHONPATH and it's working without issue.
SAGEMAKER_SUBMIT_DIRECTORY=/opt/ml/code
OLDPWD=/src
DEBIAN_FRONTEND=noninteractive
PATH=/opt/amazon/openmpi/nvidia/bin:/opt/amazon/openmpi/bin/:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
TA_PREFIX=/opt/ta-lib-core
SAGEMAKER_PROGRAM=main.py
LC_ALL=C.UTF-8
KMP_AFFINITY=granularity=fine,compact,1,0
LD_LIBRARY_PATH=/usr/local/lib:/opt/amazon/openmpi/lib/:/opt/amazon/efa/lib/:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/lib
NV_CUDA_COMPAT_PACKAGE=cuda-compat-11-2
PYTHONDONTWRITEBYTECODE=TRUE
TF_AUTOTUNE_THRESHOLD=2
NVARCH=x86_64
SHLVL=1
PYTHONIOENCODING=UTF-8
PYTHON=python3.9
DEBCONF_NONINTERACTIVE_SEEN=true
LESSOPEN=| /usr/bin/lesspipe %s
KMP_BLOCKTIME=1
TERM=xterm
LESSCLOSE=/usr/bin/lesspipe %s %s
TF_VERSION=2.10
KMP_SETTINGS=0
CUDA_VERSION=11.2.2
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
LANG=C.UTF-8
HOME=/root
NV_CUDA_CUDART_VERSION=11.2.152-1
NVIDIA_DRIVER_CAPABILITIES=compute,utility
RDMAV_FORK_SAFE=1
PWD=/src/pybox2d
NVIDIA_REQUIRE_CUDA=cuda>=11.2 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=450,driver<451
PYTHON_VERSION=3.9.10
HOSTNAME=8b1eccd1c2ad
PIP=pip3
PYTHONUNBUFFERED=TRUE
NVIDIA_VISIBLE_DEVICES=all

how to parse volume in dockerfile for an application to be pushed on docker hub

FROM python:3
WORKDIR /Users/vaibmish/Documents/new/graph-report
RUN pip install graphreport==1.2.1
CMD [ cd /Users/vaibmish/Documents/new/graph-report/graphreport_metrics ]
CMD [ graphreport ]
THIS IS PART OF DCOKERFIILE
i wish to remove cd volumes from tha file and have a command like -v there so that whoever runs that can give his or her own volume path in same
The line
CMD [ cd /Users/vaibmish/Documents/new/graph-report/graphreport_metrics ]
is wrong. You achieve the same with WORKDIR:
WORKDIR /Users/vaibmish/Documents/new/graph-report/graphreport_metrics
WORKDIR creates the path if it doesn't exist and then changes the current directory to that path (same as mkdir -p /path/new && cd /path/new)
You can also declare the path as a volume and instruct who runs the container to provide their own path (docker run -v host_path:container_path ...)
VOLUME /Users/vaibmish/Documents/new/graph-report
A final note: It looks like these paths are from the host. Remember that the paths in the Dockerfile are not host paths. They are paths inside the container.
Typical practice here is to pick some fixed path inside the Docker container. It should be a different path from where your application is installed; it does not need to match any particular host path at all.
FROM python:3
RUN pip3 install graphreport==1.2.1
WORKDIR /data
CMD ["graphreport"]
docker build -t me/graphreport:1.2.1 .
docker run --rm \
-v /Users/vaibmish/Documents/new/graph-report:/data \
me/graphreport:1.2.1
(Remember that only the last CMD has an effect, and if it's not a well-formed JSON array, Docker will interpret it as a shell command. What you show in the question would run the test(1) command and not the program you're installing.)
If you're trying to install a single package from PyPI and just run it on local files, a Python virtual environment will be much easier to set up than anything based on Docker, and will essentially work as you expect:
python3 -m venv graphreport
. graphreport/bin/activate
pip3 install graphreport==1.2.1
cd /Users/vaibmish/Documents/new/graph-report
graphreport
deactivate # switch back to system Python/pip
All of the installed Python code is inside the graphreport virtual environment directory, and if you don't need this application again, you can just delete the directory tree.

Docker file for running a Python program with parameters

I'm new to Docker. I have a Python program that I run in the following way.
python main.py --s=aws --d=scylla --p=4 --b=15 --e=local -w
Please note the double hyphen -- for the first four parameters and single hyhpen '-' for the last one.
I'm trying to run this inside a Docker container. Here's my Dockerfile:
FROM python:3.6
COPY . /app
WORKDIR /app
RUN pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt
CMD ["python","app.py","--s","(source)", "--d","(database)","--w", "(workers)", "--b", "(bucket)", "--e", "(env)", "-w"]
I'm not sure if this is will work as I don't know exactly how to test and run this. I want to run the Docker image with the following port mappings.
docker run --name=something -p 9042:9042 -p 7000:7000 -p 7001:7001 -p 7199:7199 -p 9160:9160 -p 9180:9180 -p 10000:10000 -d user/something
How can I correct the Docker file? Once I build an image how to run it?
First, fix the dockerfile:
FROM python:3.6
COPY . /app
WORKDIR /app
# optional: it is better to chain commands to reduce the number of created layers
RUN pip install --upgrade pip \
&& pip install --no-cache-dir -r requirements.txt
# mandatory: "--s=smth" is one argument
# optional: it's better to use environment variables for source, database etc
CMD ["python","app.py","--s=(source)", "--d=(database)","--w=(workers)", "--b=(bucket)", "--e=(env)", "-w"]
then, build it:
docker build -f "<dockerfile path>" -t "<tag to assign>" "<build dir (eg .)>"
Then, you can just use the assigned tag as an image name:
docker run ... <tag assigned>
UPD: I got it wrong the first time, tag should be used instead of the image name, not the instance name
UPD2: With the first response, I assumed you're going to hardcode parameters and only mentioned it is better to use environment variables. Here is an example how to do it:
First, better, option is to use check environment variables directly in your Python script, instead of command line arguments.
First, make your Python script to read environment variables.
The quickest dirty way to do so is to replace CMD with something like:
CMD ["sh", "-c", "python app.py --s=$SOURCE --d=$DATABASE --w=$WORKERS ... -w"]
(it is common to use CAPS names for environment variables)
It will be better, however, to read environment variables directly in your Python script instead of command line arguments, or use them as defaults:
# somewere in app.py
import os
...
DATABASE = os.environ.get('DATABASE', default_value) # can default ot args.d
SOURCE = os.environ.get('SOURCE') # None by default
# etc
Don't forget to update dockerfile as well in this case
# Dockerfile:
...
CMD ["python","app.py"]
Finally, pass environment variables to your run command:
docker run --name=something ... -e DATABASE=<dbname> -e SOURCE=<source> ... <tag assigned at build>
There are more ways to pass environment variables, I'll just refer to the official documentation here:
https://docs.docker.com/compose/environment-variables/

Docker- Do we need to include RUN command in Dockerfile

I have a python code and to convert it to docker image, I can use below command:
sudo docker build -t customdocker .
This converts python code to docker image. To convert it I use a Dockerfile with below commands:
FROM python:3
ADD my_script.py /
ADD user.conf /srv/config/conf.d/
RUN pip3 install <some-package>
CMD [ "python3", "./my_script.py" ]
In this, we have RUN command which install required packages. Lets say if we have deleted the image for some reason and want to build it again, so is there any way we can skip this RUN step to save some time because I think this is already installed.
Also in my code I am using a file user.conf which is in other directory. So for that I am including this in DOckerfile and also saving a copy of it in current directory. Is there a way in docker where I can define my working directory so that docker image searches for the file inside those directories.
Thanks
Yes you cannot remove the RUN or other statements in dockerfile, if you want to build the docker image again after deleteing.
You use the command WORKDIR in your dockerfile but its scope will be within the docker images, i.e when you create the container from the image workdir will be set to that metioned in WORKDIR
For ex :
WORKDIR /srv/config/conf.d/
This /srv/config/conf.d/ will set as workingdir, but you have to use below in dockerfile while building in-order to copy that file in specified location
ADD user.conf /srv/config/conf.d/
Answering your first question: A docker image holds everything related to your python environment including the packages you install. When you delete the image then the packages are also deleted from the image. Therefore, no you cannot skip that step.
Now on to your second question, you can bind a direectory while starting the container by:
docker run -v /directory-you-want-to-mount:/src/config/ customdocker
You can also set the working directory with -w flag.
docker run -w /path/to/dir/ -i -t customdocker
https://docs.docker.com/v1.10/engine/reference/commandline/run/

Categories