Docker run python script cant find locally installed module

Docker run python script cant find locally installed module - python

For context, this problem relates to a docker image that will be run using azure batch.
Here is the Dockerfile, in full:
FROM continuumio/miniconda3
ADD . /pipegen
ADD environment.yml /tmp/environment.yml
RUN conda env create -f /tmp/environment.yml
RUN echo "conda activate $(head -1 /tmp/environment.yml | cut -d' ' -f2)" >> ~/.bashrc
ENV PATH /opt/conda/envs/$(head -1 /tmp/environment.yml | cut -d' ' -f2)/bin:$PATH
ENV CONDA_DEFAULT_ENV $(head -1 /tmp/environment.yml | cut -d' ' -f2)
ADD classify.py /classify.py
RUN rm -rf /pipegen
pipgen is the local module (where the Dockerfile is located) that is installed using the environment.yml file. Here is the environment.yml file in full:
name: pointcloudz
channels:
- conda-forge
- defaults
dependencies:
- python=3.7
- python-pdal
- entwine
- matplotlib
- geopandas
- notebook
- azure-storage-blob==1.4.0
- pip:
- /pipegen
- azure-batch==6.0.0
For clarity, the directory structure looks like this:
Dockerfile
pipegen
\__ __init__.py
\__ pipegen.py
\__ utils.py
classify.py
batch_containers.py
environment.yml
setup.py
The Dockerfile establishes the environment created using the environment.yml file as the default (conda) python environment when the container is run. Therefore, I can run the container interactively as follows:
docker run -it pdalcontainers.azurecr.io/pdalcontainers/pdal-pipelines
and, from inside the container, execute the classify.py script with some command line arguments, as follows:
python classify.py in.las out.las --defaults
and the script is executed as expected. However, when I run the following command, attempting to execute the very same script from "outside" the container,
docker run -it pdalcontainers.azurecr.io/pdalcontainers/pdal-pipelines python classify.py in.las out.las --defualts
I get the following error:
File "classify.py", line 2, in <module>
from pipegen.pipegen import build_pipeline, write_las
ModuleNotFoundError: No module named 'pipegen'
Just to be clear, the classify.py script imports pipegen, the local module which is now installed in the conda environment created in the Dockerfile. I need to be able to execute the script using the docker run command above due to constraints in how Azure batch runs jobs. I've tried multiple fixes but am now pretty stuck. Any wisdom would be greatly appreciated!

The problem you are facing is because you added the conda activate to the .bashrc script which is only activated for login shells. When you run the container interactively, that is what you are getting. However, when you just try to invoke the python script directly, you do not get a login shell so your conda environment is not activated.
One think you could do is not use the conda activate and instead run the script with conda run. To simplify the command-line, add this entrypoint to your Dockerfile:
ENTRYPOINT ["conda", "run", "-n", "$CONDA_DEFAULT_ENV", "python", "classify.py"]
Using this in the entrypoint also allows the caller to pass command-line arguments via docker run.
From the Dockerfile reference
Command line arguments to docker run will be appended after all elements in an exec form ENTRYPOINT
For a more detailed explanation, see Activating a Conda environment in your Dockerfile

Related

python script that runs inside a docker image, doesn't use the usual PYTHONPATH

I'm creating a docker image using the following Dockerfile:
FROM python:3.7
RUN apt-get update && pip install sagemaker boto3 numpy sagemaker-training
# Copies the training code inside the container
COPY cv.py /opt/ml/code/train.py
COPY scikit_learn_iris.py /opt/ml/code/scikit_learn_iris.py
# Defines train.py as script entrypoint
ENV SAGEMAKER_PROGRAM train.py
# Install custom packages specified in requirements.txts
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
ENV PYTHONPATH "/usr/local/lib/python3.7/site-packages"
In the requirements file, I have added lightgbm library and it installs it successfully inside the docker image. When sagemaker runs starts to run scikit_learn_iris.py cause it can't import lightgbm: ModuleNotFoundError: No module named 'lightgbm'. I'm printing the sys path and PYTHONPATH at the start of scikit_learn_iris.py script and it shows the following results :
sys.path = ['/opt/ml/code', '/opt/ml/code', '/miniconda3/bin', '/miniconda3/lib/python37.zip', '/miniconda3/lib/python3.7', '/miniconda3/lib/python3.7/lib-dynload', '/miniconda3/lib/python3.7/site-packages']
PYTHONPATH = ['/opt/ml/code', '/miniconda3/bin', '/miniconda3/lib/python37.zip', '/miniconda3/lib/python3.7', '/miniconda3/lib/python3.7/lib-dynload', '/miniconda3/lib/python3.7/site-packages']
why the script is using /miniconda3/... to find the libraries? Even tough I'm setting PYTHONPATH env variable in the Dockerfile? How do I make it understand to look in the correct path?! This path /miniconda3/ doesn't even exists in the docker image when I checked (using docker run -it IMAGE_NAME bash)

I don't see any base image in the Dockerfile, it is possible that came from a base image if there was any. And I don't know if this helps, but this is the ```prinenv`` output from my training image. And I didn't have to set PYTHONPATH and it's working without issue.
SAGEMAKER_SUBMIT_DIRECTORY=/opt/ml/code
OLDPWD=/src
DEBIAN_FRONTEND=noninteractive
PATH=/opt/amazon/openmpi/nvidia/bin:/opt/amazon/openmpi/bin/:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
TA_PREFIX=/opt/ta-lib-core
SAGEMAKER_PROGRAM=main.py
LC_ALL=C.UTF-8
KMP_AFFINITY=granularity=fine,compact,1,0
LD_LIBRARY_PATH=/usr/local/lib:/opt/amazon/openmpi/lib/:/opt/amazon/efa/lib/:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/lib
NV_CUDA_COMPAT_PACKAGE=cuda-compat-11-2
PYTHONDONTWRITEBYTECODE=TRUE
TF_AUTOTUNE_THRESHOLD=2
NVARCH=x86_64
SHLVL=1
PYTHONIOENCODING=UTF-8
PYTHON=python3.9
DEBCONF_NONINTERACTIVE_SEEN=true
LESSOPEN=| /usr/bin/lesspipe %s
KMP_BLOCKTIME=1
TERM=xterm
LESSCLOSE=/usr/bin/lesspipe %s %s
TF_VERSION=2.10
KMP_SETTINGS=0
CUDA_VERSION=11.2.2
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
LANG=C.UTF-8
HOME=/root
NV_CUDA_CUDART_VERSION=11.2.152-1
NVIDIA_DRIVER_CAPABILITIES=compute,utility
RDMAV_FORK_SAFE=1
PWD=/src/pybox2d
NVIDIA_REQUIRE_CUDA=cuda>=11.2 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=450,driver<451
PYTHON_VERSION=3.9.10
HOSTNAME=8b1eccd1c2ad
PIP=pip3
PYTHONUNBUFFERED=TRUE
NVIDIA_VISIBLE_DEVICES=all

Pass windows environmental variables to dockerized python app

I am running a python application that reads two paths from Windows env vars and proceeds to use the executables in those paths to do OCR on some documents. Since POPPLER, TESSERACT env vars are already set in Windows, this Python snippet works for me:
popplerPath = os.environ.get('POPPLER')
tesseractPath = os.environ.get('TESSERACT')
Now I am trying to dockerize the app, and, to my understanding, since my container will need access to those paths, I need to mount them using VOLUME during run. My dockerfile looks like this:
FROM python:3.7.7-slim
WORKDIR ./
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY documents/ .
COPY src/ ./src
CMD [ "python", "./src/run.py" ]
I build the image using:
docker build -t ocr .
And I try to run my container using:
docker run -v %POPPLER%:%POPPLER% -v %TESSERACT%:%TESSERACT% ocr
... but my app still gets a None value for these paths and can't use the executable files. Is my approach correct and beyond that, is it a good dev practice?

See the doc, the switch for environment variable is -e:
$ docker run -e MYVAR1 --env MYVAR2=foo --env-file ./env.list ubuntu bash
and in dockerfile, you can use
ENV FOO=/bar
If I understand your statement correctly, your paths are mounted in the container in the same path as the host. The only problem is your Python script, which expects the paths to be provided by the environment variable. This will not exist unless you pass on them from your host system to your container system.
Once you verified your mounted volume with -v is there correctly, you can try with
docker run -v %POPPLER%:%POPPLER% -v %TESSERACT%:%TESSERACT% --env POPPLER=%POPPLER% --env TESSERACT=%TESSERACT% ocr
or, if you always run this, you can consider to put them in your dockerfile to save some keystroke.

Any executable you call must be built into the image. Containers can't usually call executables on the host or in other containers. In the specific example you show, a Linux container can't run a Windows executable, even if you do use a bind mount to inject it into the container.
The "slim" python images are built on Debian GNU/Linux, and you need to use its APT tool to install these executable dependencies in your Dockerfile. (https://www.debian.org/distrib/packages has a search box to help you find the right package name; Ubuntu Linux also uses Debian packages.)
FROM python:3.7-slim
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive \
apt-get install -y \
popper-utils \
tesseract-ocr-all
COPY requirements.txt .
...
I'd suggest putting reasonable defaults in your code if these environment variables aren't set. The apt-get install command will put them in the system path inside the image.
popplerPath = os.environ.get('POPPLER', 'poppler')
tesseractPath = os.environ.get('TESSERACT', 'tesseract')
If you really need them as environment variables you could use the Dockerfile ENV directive
ENV POPPLER=poppler TESSERACT=tesseract
Environment variables from the host don't automatically get passed through to the container; you need a Dockerfile ENV or docker run -e option. Also remember that the container has an isolated filesystem (and Windows-syntax paths don't make sense in Linux containers) so these environment variables would need to be container paths, the second half of your proposed docker run -v option.

how to parse volume in dockerfile for an application to be pushed on docker hub

FROM python:3
WORKDIR /Users/vaibmish/Documents/new/graph-report
RUN pip install graphreport==1.2.1
CMD [ cd /Users/vaibmish/Documents/new/graph-report/graphreport_metrics ]
CMD [ graphreport ]
THIS IS PART OF DCOKERFIILE
i wish to remove cd volumes from tha file and have a command like -v there so that whoever runs that can give his or her own volume path in same

The line
CMD [ cd /Users/vaibmish/Documents/new/graph-report/graphreport_metrics ]
is wrong. You achieve the same with WORKDIR:
WORKDIR /Users/vaibmish/Documents/new/graph-report/graphreport_metrics
WORKDIR creates the path if it doesn't exist and then changes the current directory to that path (same as mkdir -p /path/new && cd /path/new)
You can also declare the path as a volume and instruct who runs the container to provide their own path (docker run -v host_path:container_path ...)
VOLUME /Users/vaibmish/Documents/new/graph-report
A final note: It looks like these paths are from the host. Remember that the paths in the Dockerfile are not host paths. They are paths inside the container.

Typical practice here is to pick some fixed path inside the Docker container. It should be a different path from where your application is installed; it does not need to match any particular host path at all.
FROM python:3
RUN pip3 install graphreport==1.2.1
WORKDIR /data
CMD ["graphreport"]
docker build -t me/graphreport:1.2.1 .
docker run --rm \
-v /Users/vaibmish/Documents/new/graph-report:/data \
me/graphreport:1.2.1
(Remember that only the last CMD has an effect, and if it's not a well-formed JSON array, Docker will interpret it as a shell command. What you show in the question would run the test(1) command and not the program you're installing.)
If you're trying to install a single package from PyPI and just run it on local files, a Python virtual environment will be much easier to set up than anything based on Docker, and will essentially work as you expect:
python3 -m venv graphreport
. graphreport/bin/activate
pip3 install graphreport==1.2.1
cd /Users/vaibmish/Documents/new/graph-report
graphreport
deactivate # switch back to system Python/pip
All of the installed Python code is inside the graphreport virtual environment directory, and if you don't need this application again, you can just delete the directory tree.

What should I put for Docker CMD and ENTRYPOINT for Flask app running "python myapp.py images/*"

I am trying to run a Flask app using Docker.
Normally, to execute the Flask app, I run this inside of my Terminal:
python myapp.py images/*
I am unsure of how to convert that to Docker CMD syntax (or if I need to edit ENTRYPOINT).
Here is my docker file:
RUN apt-get update -y
RUN apt-get install -y python-pip python-dev build-essential hdf5-tools
COPY . ~/myapp/
WORKDIR ~/myapp/
RUN pip install -r requirements.txt
ENTRYPOINT ["python"]
CMD ["myapp.py"]
Inside of requirements.txt:
flask
numpy
h5py
tensorflow
keras
When I run the docker image:
person#person:~/Projects/$ docker run -d -p 5001:5000 myapp
19645b69b68284255940467ffe81adf0e32a8027f3a8d882b7c024a10e60de46
docker ps:
Up 24 seconds 0.0.0.0:5001->5000/tcp hardcore_edison
When I got to localhost:5001 I get no response.
Is it an issue with my CMD parameter?
EDIT:
New Dockerfile:
RUN apt-get update -y
RUN apt-get install -y python-pip python-dev build-essential hdf5-tools
COPY . ~/myapp/
WORKDIR ~/myapp/
EXPOSE 5000
RUN pip install -r requirements.txt
CMD ["python myapp.py images/*.jpg "]
With this new configuration, when I run:
docker run -d -p 5001:5000 myapp
I get:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "exec: \"python myapp.py images/*.jpg \": stat python myapp.py images/*.jpg : no such file or directory": unknown.
When I run:
docker run -d -p 5001:5000 myapp python myapp.py images/*.jpg
I get the Docker image to run, but now when I go to localhost:5001, it complains that the connection was reset.

I'm glad you've already solved this issue. I put up this answer just for those who still have the same confusions like you do about ENTRYPOINT and CMD executives.
In a Dockerfile, ENTRYPOINT and CMD are two similar executives, but still have strong difference between them. The most important one(only seems to me) is that CMD could be overwritten but ENTRYPOINT not.
To explain this, I may offer you guys the command blow:
docker run -tid --name=container_name image_name [command]
As we can see, command is optional, and it(if exists) could overwrite CMD defined in Dockerfile.
Let's back to your issue. You may have two ways to achieve your purpose-->
ENTRYPOINT ["python"] and CMD ["/path/to/myapp.py", "/path/to/images/*.jpg"].
CMD python /path/to/myapp.py /path/to/images/*.jpg. This is mentioned by #David Maze above.
To understand the first one, you may take CMD as arguments for ENTRYPOINT.
A simple example below.
Dockerfile-->
FROM ubuntu:18.04
ENTRYPOINT ["cat"]
CMD ["/etc/hosts"]
Build image named test-cmd-show and start a container from it.
docker run test-cmd-show
This would show the content in /etc/hosts file. And go on...
docker run test-cmd-show /etc/resolv.conf
And this would show us the content of /etc/resolv.conf file. And go on ...
docker run test-cmd-show --help
This would show the help information for command cat.
Fantastic, right?
Somehow, we could do more research though this functionality.
Add a relevant question: What's the difference between CMD and ENTRYPOINT?

The important thing is that you need a shell to expand your command line, so I’d write
CMD python myapp.py images/*
When you just write CMD like this (without the not-really-JSON brackets and quotes) Docker will implicitly feed the command line through a shell for you.
(You also might consider changing your application to support taking a directory name as configuration in some form and “baking it in” to your application, if these images will be in a fixed place in the container filesystem.)
I would only set ENTRYPOINT when (a) you are setting it to a wrapper shell script that does some first-time setup and then exec "$#"; or (b) when you have a FROM scratch image with a static binary and you literally cannot do anything with the container besides run the one binary in it.

One issue I found was that the app wasn't accessible to Docker. I added this to app.run:
host='0.0.0.0'
According to this:
Deploying a minimal flask app in docker - server connection issues
Next, Docker panics when you add a directory to the CMD parameters.
So, I removed ENTRYPOINT and CMD and manually added the command to the Docker run:
docker docker run -d -p 5001:5000 myappdocker python myapp.py images/*.jpg

Cant create conda env in dockerfile

I have an environment.yml in my applications folder
I have this in my dockerfile:
RUN conda env create
RUN source activate myenvfromymlfile
When I run the container though the env is not activated. If I do conda env list Is see /opt/conda is activated:
root#9c7181cf86aa:/app# conda env list
# conda environments:
#
myenvfromymlfile /opt/conda/envs/myenvfromymlfile
root * /opt/conda
If I attach to the container I can manually run source activate myenvfromymlfile and it works, but why doesn't that work in the RUN directive??
In examples, I see this often in dockerfiles that require conda:
CMD [ "source activate your-environment && exec python application.py" ]
Can someone explain why it is necessary to use && to make it a single command? And why running "source activate" in a RUN directive does not work? I want to have my dockerfile look like this:
RUN conda env create
RUN source activate myenvfromymlfile
ENTRYPOINT ["python"]
CMD ["application.py"]

Consider the below Dockerfile
RUN conda env create
RUN source activate myenvfromymlfile
ENTRYPOINT ["python"]
CMD ["application.py"]
Statement #1 conda env create. Create the environment and changes files on the disk.
Statement #2 source activate myenvfromymlfile. Loads some stuff in the bash sessions. No disk changes done here
Statement #3 and #4 specifies what happens when you run the container
ENTRYPOINT ["python"]
CMD ["application.py"]
So now when you run the container. Anything that you did in step#2 is not there, because a shell was launched to run step #2, when it completed the shell was closed. Now when you run the image a new shell is started and it is brand new shell with now no knowledge that in past inside your dockerfile you ran source activate myenvfromymlfile
Now you want to run this application.py in the environment you created. The default shell of docker is sh -c. So when you set CMD as below
CMD [ "source activate your-environment && exec python application.py" ]
The final command executed at start of container becomes
sh -c "source activate your-environment && exec python application.py"
Which activates the environment in current shell and then runs your program.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.