Transfer virtualenv to docker image - python

Is it possible to transfer virtual environment data from a local host to a docker image via the ADD command?
Rather than doing pip installs inside the container, I would rather the user have all of that done locally and simply transfer the virtual environment into the container. Granted all of the files are the same name locally as in the docker container, along with all directories being nested properly.
This would save minutes to hours if it was possible to transfer virtual environment settings into a docker image. Maybe I am thinking about this in the wrong abstract.
It just feels very inefficient doing pip installs via a requirements.txt that was passed into the container, as opposed to doing it all locally, otherwise each time the image is started up it has to re-install the same dependencies that have not changed from each image's build.

We had run into this problem earlier and here are a few things we considered:
Consider building base images that have common packages installed. The app containers can then use the one of these base containers and install the differential.
Cache the Pip packages on a local path that can be mounted on the container. That'll save the time to download the packages.
Depending on the complexity of your project one may suit better than the other - you may also consider a hybrid approach to find maximum optimization.

While possible, it's not recommended.
Dependencies (library versions, globally installed packages) can be different on host machine and container.
Image builds will not be 100% reproducible on other hosts.
Impact of pip install is not big. Each RUN command creates it's own layer, which are cached locally and also in repository, so pip install will be re-run only when requirements.txt is changed (or previous layers are rebuilt).
To trigger pip install only on requirements.txt changes, Dockerfile should start this way:
...
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY src/ ./
...
Also, it will be run only on image build, not container startup.
If you have multiple containers with same dependencies, you can build intermediate image with all the dependencies and build other images FROM it.

Related

How can I identify the source of a volume inside a docker container?

Tl;Dr: is there a way to check if a volume /foo built into a docker image has been overwritten at runtime by remounting using -v host/foo:/foo?
I have designed an image that runs a few scripts on container initialization using s6-overlay to dynamically generate a user, transfer permissions, and launch a few services under that uid:gid.
One of the things I need to do is pip install -e /foo a python module mounted at /foo. This also installs a default version of /foo contained in the docker image if the user doesn't specify a volume. The reason I am doing this install at runtime is because this container is designed to contain the entire environment for development and experimentation of foo, so if a user mounts a system version of foo, e.g. -v /home/user/foo:/foo, the user can develop by updating host:/home/user/foo or in container:/foo and all changes will persist and the image won't need to be rebuilt to get new changes. It needs to be an editable install so that new changes don't require reinstallation.
I have this working now.
I would like to speed up container initialization by moving this pip install into the image build, and then only install the module at runtime if the user has mounted a new /foo at runtime using -v /home/user/foo:/foo.
Of course, there are other ways to do this. For example, I could build foo into the image copying it to /bar at build time and install foo using pip install /bar... Then at runtime just check if /foo exists and if it doesn't then create a symlink /foo->/bar. If it does exist then pip uninstall foo and pip install -e /foo.. but this isn't the cleanest solution. I could also just mv /bar /foo at runtime if /foo doesn't exist.. but I'm not sure how pip will handle the change in module path.
The only way to do this, that I can think of, is to map the docker socket into the container, so you can do docker inspect from inside the container and see the mounted volumes. Like this
docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock docker
and then inside the container
docker inspect $(hostname)
I've used the 'docker' image, since that has docker installed. You can, of course, use another image. You just have to install docker in it.

How to pip install packages written in Pipfile without creating virtualenv?

I created a package, containing Pipfile, and I want to test with docker.
I want to install packages written in Pipfile with pip, without creating virutalenv.
# (do something to create some-file)
RUN pip install (some-file)
How to do?
Eventually pip should be able to do this itself, at least that's what they say. Currently, that is not yet implemented.
For now, a Pipfile is a TOML file, so you can use a TOML parser to extract the package constraints and emit them in a format that will be recognized by pip. For example, if your Pipfile contains only simple string version specifiers, this little script will write out a requirements.txt file that you can then pass to pip install -r:
import sys
import toml
with open(sys.argv[1]) as f:
result = toml.load(f)
for package, constraint in result['packages'].items():
if constraint == '*':
print(package)
else:
print(f'{package} {constraint}')
If your Pipfile contains more complicated constructs, you'll have to edit this code to account for them.
An alternative that you might consider, which is suitable for a Docker container, is to use pipenv to install the packages into the system Python installation and just remove the generated virtual environment afterwards.
pipenv install --system
pipenv --rm
However, strictly speaking that doesn't achieve your stated goal of doing this without creating a virtualenv.
One of the other answers lead to me to this, but wanted to explicitly call it out, and why it's a useful solution.
Pipenv is useful as it helps you create a virtual environment. This is great on your local dev machine as you will often have many projects, with different dependencies etc.
In CICD, you will be using containers that are often are only spun up for a few minutes to complete part of your CICD pipeline. Since you will spin up a new container each time you run your pipeline, there is no need to create a virtual environment in your container to keep things organised. You can simply install all your dependencies directly to the main OS version of python.
To do this, run the below command in your CICD pipeline:
pipenv install --system

Prevent Docker from installing python package requirements on every build (without requirements.txt)

I'm building an image from a Dockerfile where my main program is a python application that has a number of dependencies. The application is installed via setup.py and the dependencies are listed inside. There is no requirements.txt. I'm wondering if there is a way to avoid having to download and build all of the application dependencies, which rarely change, on every image build. I saw a number of solutions that use the requirements.txt file but I'd like to avoid having one if possible.
You can use requires.txt from the egg info to preinstall the requirements.
WORKDIR path/to/setup/script
RUN python setup.py egg_info
RUN pip install -r pkgname.egg-info/requires.txt
One solution is: if those dependencies rarely change as you said what you could do is another image with already those packages installed. You would create that image and then save it using docker save, so you end with a new base image with the required dependencies. docker save will create a .tar with the image. You have to load the image using docker load and then in your Dockerfile you would do:
FROM <new image with all the dependencies>
//your stuff and no need to run pip install
....
Hope it helps
Docker save description

How to install conda package from custom file channel in Docker file?

Hi I have a custom conda channel, something like file://path_to_channel and I want to install packages from that channel when building a docker images, something like:
...
RUN conda config add -channel file://...
RUN conda install mypackage
...
The problem here is that that file path seems not to be mounted to the docker image at build time.
My question is, apart from copying the whole channel into the docker image, is there another way we can install python package from custom file based channel, in the Dockerfile, at build time.
My answer
The answer down below is correct, docker do support runtime mount now. But I did not go down this path as we are on an older docker.
To bypass this I setup an http server to serve the files. This is extremely easy if you use node or python.
I think this is recently possible using the RUN --mount command. (It may still even be experimental.) You can find some examples here.
An alternative is to serve the files using a local web server.

Deploying a custom python package with `pip`

I have a custom Python package (call it MyProject) on my filesystem with a setup.py and a requirements.txt. This package needs to be used by a Flask server (which will be deployed on AWS/EC2/EB).
In my Flask project directory, I create a virtualenv and run pip install -e ../path/to/myProject.
But for some reason, MyProject's upstream git repo shows up in pip freeze:
...
llvmlite==0.19.0
-e git+https://github.com/USERNAME/MYPROJECT.git#{some-git-hash}
python-socketio==1.8.0
...
The reference to git is a problem, because the repository is private and the deployment server does not (and should not, and will never) have credentials to access it. The deployment server also doesn't even have git installed (and it seems extremely problematic that pip assumes without my permission that it does). There is nothing in MyProject's requirements.txt or setup.py that alludes to git, so I am not sure where the hell this is coming from.
I can dupe the project to a subdirectory of the Flask project, and then put the following in MyFlaskProject's requirements.txt:
...
llvmlite==0.19.0
./MyProject
python-socketio==1.8.0
...
But this doesn't work, because the path is taken as relative to the working directory of the pip process when it is run, not to requirements.txt. Indeed, it seems pip is broken in this respect. In my case, EC2 runs its install scripts from some other directory (with a full path to requirements.txt specified), and as expected, this fails.
What is the proper way to deploy a custom python package as a dependency of another project?
To install your own python package from a git repo you might want to check this post
To sort out the credential issue, why not having git installed on the EC2? You could simply create an ssh key and share it with MyProject repository.
I am using this solution on ECS instances deployed by Jenkins (with Habitus to hide Jenkin's ssh keys while building the image) and it works fine for me!

Categories