Passing PIP_EXTRA_INDEX_URL to docker build - python

I am building an app that has a dependency available on a private Pypi server.
My Dockerfile looks like this:
FROM python:3.6
WORKDIR /src/mylib
COPY . ./
RUN pip install .
I want pip to use the extra server to install the dependencies. So I'm trying to pass the PIP_EXTRA_INDEX_URL environment variable during the build phase like so:
"docker build --pull -t $IMAGE_TAG --build-arg PIP_EXTRA_INDEX_URL=$PIP_EXTRA_INDEX_URL ."
For some reason it is not working as intended, and RUN echo $PIP_EXTRA_INDEX_URL returns nothing.
What is wrong?

You should add ARG to your Dockerfile. Your Dockerfile should look like this:
FROM python:3.6
ARG PIP_EXTRA_INDEX_URL
# YOU CAN ALSO SET A DEFAULT VALUE:
# ARG PIP_EXTRA_INDEX_URL=DEFAULT_VALUE
RUN echo "PIP_EXTRA_INDEX_URL = $PIP_EXTRA_INDEX_URL"
# you could also use braces - ${PIP_EXTRA_INDEX_URL}
WORKDIR /src/mylib
COPY . ./
RUN pip install .
If you want to know more, take a look this article.

Related

Docker run does not produce any endpoint

I am trying to dockerize this repo. After building it like so:
docker build -t layoutlm-v2 .
I try to run it like so:
docker run -d -p 5001:5000 layoutlm-v2
It downloads the necessary libraries and packages:
And then nothing... No errors, no endpoints generated, just radio silence.
What's wrong? And how do I fix it?
You appear to be expecting your application to offer a service on port 5000, but it doesn't appear as if that's how your code behaves.
Looking at your code, you seem to be launching a service using gradio. According the quickstart, calling gr.Interface(...).launch() will launch a service on localhost:7860, and indeed, if you inspect a container booted from your image, we see:
root#74cf8b2463ab:/app# ss -tln
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 2048 127.0.0.1:7860 0.0.0.0:*
There's no way to access a service listening on localhost from outside the container, so we need to figure out how to fix that.
Looking at these docs, it looks like you can control the listen address using the server_name parameter:
server_name
to make app accessible on local network, set this to "0.0.0.0". Can be set by environment variable GRADIO_SERVER_NAME. If None, will use "127.0.0.1".
So if we run your image like this:
docker run -p 7860:7860 -e GRADIO_SERVER_NAME=0.0.0.0 layoutlm-v2
Then we should be able to access the interface on the host at http://localhost:7860/...and indeed, that seems to work:
Unrelated to your question:
You're setting up a virtual environment in your Dockerfile, but you're not using it, primarily because of a typo here:
ENV PATH="VIRTUAL_ENV/bin:$PATH"
You're missing a $ on $VIRTUAL_ENV.
You could optimize the order of operations in your Dockerfile. Right now, making a simple change to your Dockerfile (e.g, editing the CMD setting) will cause much of your image to be rebuilt. You could avoid that by restructuring the Dockerfile like this:
FROM python:3.9
# Install dependencies
RUN apt-get update && apt-get install -y tesseract-ocr
RUN pip install virtualenv && virtualenv venv -p python3
ENV VIRTUAL_ENV=/venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt
RUN git clone https://github.com/facebookresearch/detectron2.git
RUN python -m pip install -e detectron2
COPY . /app
# Run the application:
CMD ["python", "-u", "app.py"]

How to pass two files from different directory to docker run?

I have a docker container that looks like this
FROM python:3.8
ADD lodc.py .
RUN pip install requests python-dotenv
CMD [ "python", "./lodc.py", "file1.json", "file2.json" ]
it needs to take an env file and then two different files as the arguments that is needed for the script lodc.py to run. I have tried mounting them like described here Passing file as argument to Docker container but I cannot get it to work. It is important to keep the two files isolated because those files will be changing frequently so it doesn't make sense to put them into the container. Here is what I've been running
docker run --env-file /Users/Documents/github/datasets/tmp/.env -v /Users/Documents/datasets/files:/Users/Documents/github/datasets datasets datasets/file1.json datasets/file2.json
Basically I would like to just build and run the docker container and be able to munipulate the two argument files in another directory whenever I want without issue.
The env file is being passed correctly and it is failing because it cannot find the file.json directory. I am new to docker and any help would be greatly appreciated.
Thanks.
I think you just got the run command wrong. Try this one:
docker run \
--env-file /Users/Documents/github/datasets/tmp/.env \
-v /Users/Documents/datasets/files/file1.json:/file1.json \
-v /Users/Documents/github/datasets/files/file2.json:/file2.json \
<your-built-docker-image-name>
I'm not sure about your paths, but you need to run it with two different volumes (-v argument).
If the one you are trying does not work. You can also try below as an option(keep in mind that you will have to build docker image every time input files change):
I created the below directory structure based on your comments:
FROM python:3.8
ADD lodc.py .
COPY dir1/file1.json .
COPY dir2/file2.json .
CMD [ "python", "./lodc.py", "file1.json", "file2.json"]
Please let me know if you need more help.
To give a complete answer you can do it three ways:
In the Docker file you can use the Copy command - this is nice, but it kind of hides what volumes are being used if you use an orchestration tool like Kubernetes or Docker-Compose down the line
FROM python:3.8
ADD lodc.py .
COPY dir1/file1.json .
COPY dir2/file2.json .
CMD [ "python", "./lodc.py", "file1.json", "file2.json"]
You can reference a volume in command line using the -v argument; this is less preferable as it doesn't codify the volumes and it is dependent on the directory the command is executed in
docker run \
--env-file /Users/Documents/github/datasets/tmp/.env \
-v /Users/Documents/datasets/files/file1.json:/file1.json \
-v /Users/Documents/github/datasets/files/file2.json:/file2.json \
<your-built-docker-image-name>
You can reference a volume in a docker-compose file; this option is preferred because the volume referenced is in code, it is explicit the volume is included, and it isn't dependent on where the command is executed
version: '3.9'
services:
base:
build:
context: .
dockerfile: <relative-path-to-dockerfile-from-docker-compose>
image: <desired-image-name>
env_file:
- /Users/Documents/github/datasets/tmp/.env
volumes:
- /Users/Documents/datasets/files/file1.json:/file1.json
- /Users/Documents/github/datasets/files/file2.json:/file2.json
The last option will reduce headaches in the long-term as you scale because is the most explicit and stable; Good Luck!
You just have to make clear where you are running your python (i.e, where are "you" inside the container).
In other words, you have to define (somehow) your WORKDIR. By default, the workdir is at / (the container's root directory).
Let's suppose, for simplicity, that you define your workdir to be inside your "datasets/files" directory. (Also, I'll use a shorter path inside the container...because we can ;)
So, I'd suggest the following for your Dockerfile and docker-run:
FROM python:3.8
ADD lodc.py .
RUN pip install requests python-dotenv
CMD [ "python", "/lodc.py", "file1.json", "file2.json" ]
build:
$ docker build -t datasets
run:
$ docker run --env-file /Users/Documents/github/datasets/tmp/.env \
-v /Users/Documents/datasets/files:/mnt/datasets \
-w /mnt/datasets
datasets
Or, you could free your Dockerfile a little bit, and push it to your run command:
FROM python:3.8
ADD lodc.py .
RUN pip install requests python-dotenv
ENTRYPOINT [ "python", "/lodc.py" ]
build:
$ docker build -t datasets
run:
$ docker run --env-file /Users/Documents/github/datasets/tmp/.env \
-v /Users/Documents/datasets/files:/mnt/datasets \
datasets /mnt/datasets/file1.json /mnt/datasets/file2.json
I didn't test it, but it looks (syntactically) ok.

How to create a multistage dockerfile for a python app?

Below is the directory structure and the dockerfile for my python application. In order to run the main.py, I need to create a data set by running generate_data.py, which is in the data directory. How can I create a multistage dockerfile in order to first create the data and then run the main.py file? I'm new to using docker and I feel overwhelmed.
FROM python:3.7.2-slim
WORKDIR /usr/src/app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . /usr/src/app
CMD ["python", "./src/main.py"]
You can create a shell script then use that for CMD
start.sh:
#!/bin/bash
python generate_data.py
python ./src/main.py
Dockerfile:
FROM python:3.7.2-slim
WORKDIR /usr/src/app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . /usr/src/app
CMD ["sh", "start.sh"]
A key point of using docker might be to isolate your programs, so at first glance, you might want to move them to separate containers and talk to each other using a shared volume or a docker network, but if you really need them to run in the same container, you can achieve this by using a bash script. and replacing CMD with:
COPY run.sh
RUN chmod a+x run.sh
CMD ["./run.sh"]
You can also include if statements into a bash script and pass arguments to the bash script through docker.

Docker file for running a Python program with parameters

I'm new to Docker. I have a Python program that I run in the following way.
python main.py --s=aws --d=scylla --p=4 --b=15 --e=local -w
Please note the double hyphen -- for the first four parameters and single hyhpen '-' for the last one.
I'm trying to run this inside a Docker container. Here's my Dockerfile:
FROM python:3.6
COPY . /app
WORKDIR /app
RUN pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt
CMD ["python","app.py","--s","(source)", "--d","(database)","--w", "(workers)", "--b", "(bucket)", "--e", "(env)", "-w"]
I'm not sure if this is will work as I don't know exactly how to test and run this. I want to run the Docker image with the following port mappings.
docker run --name=something -p 9042:9042 -p 7000:7000 -p 7001:7001 -p 7199:7199 -p 9160:9160 -p 9180:9180 -p 10000:10000 -d user/something
How can I correct the Docker file? Once I build an image how to run it?
First, fix the dockerfile:
FROM python:3.6
COPY . /app
WORKDIR /app
# optional: it is better to chain commands to reduce the number of created layers
RUN pip install --upgrade pip \
&& pip install --no-cache-dir -r requirements.txt
# mandatory: "--s=smth" is one argument
# optional: it's better to use environment variables for source, database etc
CMD ["python","app.py","--s=(source)", "--d=(database)","--w=(workers)", "--b=(bucket)", "--e=(env)", "-w"]
then, build it:
docker build -f "<dockerfile path>" -t "<tag to assign>" "<build dir (eg .)>"
Then, you can just use the assigned tag as an image name:
docker run ... <tag assigned>
UPD: I got it wrong the first time, tag should be used instead of the image name, not the instance name
UPD2: With the first response, I assumed you're going to hardcode parameters and only mentioned it is better to use environment variables. Here is an example how to do it:
First, better, option is to use check environment variables directly in your Python script, instead of command line arguments.
First, make your Python script to read environment variables.
The quickest dirty way to do so is to replace CMD with something like:
CMD ["sh", "-c", "python app.py --s=$SOURCE --d=$DATABASE --w=$WORKERS ... -w"]
(it is common to use CAPS names for environment variables)
It will be better, however, to read environment variables directly in your Python script instead of command line arguments, or use them as defaults:
# somewere in app.py
import os
...
DATABASE = os.environ.get('DATABASE', default_value) # can default ot args.d
SOURCE = os.environ.get('SOURCE') # None by default
# etc
Don't forget to update dockerfile as well in this case
# Dockerfile:
...
CMD ["python","app.py"]
Finally, pass environment variables to your run command:
docker run --name=something ... -e DATABASE=<dbname> -e SOURCE=<source> ... <tag assigned at build>
There are more ways to pass environment variables, I'll just refer to the official documentation here:
https://docs.docker.com/compose/environment-variables/

What is a good way to add python dependencies to a Docker container?

I am trying to integrate docker in to my django workflow and I have everything set up except one really annoying issue. If I want to add dependencies to my requirements.txt file I basically just have to rebuild the entire container image for those dependencies to stick.
For example, I followed the docker-compose example for django here. the yaml file is set up like this:
db:
image: postgres
web:
build: .
command: python manage.py runserver 0.0.0.0:8000
volumes:
- .:/code
ports:
- "8000:8000"
links:
- db
and the Docker file used to build the web container is set up like this:
FROM python:2.7
ENV PYTHONUNBUFFERED 1
RUN mkdir /code
WORKDIR /code
ADD requirements.txt /code/
RUN pip install -r requirements.txt
ADD . /code/
So when the image is built for this container requirements.txt is installed with whatever dependencies are initially in it.
If I am using this as my development environment it becomes very difficult to add any new dependencies to that requirements.txt file because I will have to rebuild the container for the changes in requirements.txt to be installed.
Is there some sort of best practice out there in the django community to deal with this? If not, I would say that docker looks very nice for packaging up an app once it is complete, but is not very good to use as a development environment. It takes a long time to rebuild the container so a lot of time is wasted.
I appreciate any insight . Thanks.
You could mount requirements.txt as a volume when using docker run (untested, but you get the gist):
docker run container:tag -v /code/requirements.txt ./requirements.txt
Then you could bundle a script with your container which will run pip install -r requirements.txt before starting your application, and use that as your ENTRYPOINT. I love the custom entrypoint script approach, it lets me do a little extra work without needing to make a new container.
That said, if you're changing your dependencies, you're probably changing your application and you should probably make a new container and tag it with a later version, no? :)
So I changed the yaml file to this:
db:
image: postgres
web:
build: .
command: sh startup.sh
volumes:
- .:/code
ports:
- "8000:8000"
links:
- db
I made a simple shell script startup.sh:
#!/bin/bash
#restart this script as root, if not already root
[ `whoami` = root ] || exec sudo $0 $*
pip install -r dev-requirements.txt
python manage.py runserver 0.0.0.0:8000
and then made a dev-requirements.txt that is installed by the above shell script as sort of a dependency staging environment.
when I am satisfied with a dependency in dev-requirements.txt I will just move it over to the requirements.txt to be committed to the next build of the image. This gives me flexibility to play with adding and removing dependencies while developing.
I think the best way is to ignore what's currently the most common way to install python dependencies (pip install -r requirements.txt) and specify your requirements directly in the Dockerfile, effectively getting rid of the requirements.txt file. Additionally you get dockers layer caching for free.
FROM python:2.7
ENV PYTHONUNBUFFERED 1
RUN mkdir /code
WORKDIR /code
# make sure you install requirements before the ADD, since everything after ADD is not cached
RUN pip install flask==0.10.1
RUN pip install sqlalchemy==1.0.6
...
ADD . /code/
If the docker container is the only way your application is ever run, then I would suggest you do it this way. If you want to support other means of setting up your code (e.g. virtualenv) then this is of course not for you and you should fall back to either using a requirements file or use a setup.py routine. Either way, I found this way to be most simple and straightforward without dealing with all the messed up python package distribution issues.

Categories