How to install conda package from custom file channel in Docker file?

How to install conda package from custom file channel in Docker file? - python

Hi I have a custom conda channel, something like file://path_to_channel and I want to install packages from that channel when building a docker images, something like:
...
RUN conda config add -channel file://...
RUN conda install mypackage
...
The problem here is that that file path seems not to be mounted to the docker image at build time.
My question is, apart from copying the whole channel into the docker image, is there another way we can install python package from custom file based channel, in the Dockerfile, at build time.
My answer
The answer down below is correct, docker do support runtime mount now. But I did not go down this path as we are on an older docker.
To bypass this I setup an http server to serve the files. This is extremely easy if you use node or python.

I think this is recently possible using the RUN --mount command. (It may still even be experimental.) You can find some examples here.
An alternative is to serve the files using a local web server.

Related

How to containerize a python script from a pulled image from docker hub

First I am very new to docker, so apologies if this doesn't make sense.
This is my situation:
I have a data science/machine learning project in a python script (written in a single .py file). I want to containerize this application.
I would need to create a Dockerfile to do that. But since this is a machine learning project, there are a lot of packages that I need to pip install. So I found this Docker image from https://hub.docker.com/r/continuumio/miniconda3, which has miniconda installed, which has the packages that I need.
I pulled this image.
And now I don't know what to do with it. How do I continue from here. So far, my Dockerfile is empty. How can I use this image as my starting point and perhaps install more modules if needed and then finally, how to containerize my python script based on this modified image?
Many thanks.

You can specify a base image using the FROM command.
Example:
FROM continuumio/miniconda3:latest
You'll want to use the RUN command to install dependencies (assuming you need any more,) the COPY command to get your main.py file into the container, and CMD can be used to set a default command to run when the container starts up. Documentation for each of those can be found here.

How can I identify the source of a volume inside a docker container?

Tl;Dr: is there a way to check if a volume /foo built into a docker image has been overwritten at runtime by remounting using -v host/foo:/foo?
I have designed an image that runs a few scripts on container initialization using s6-overlay to dynamically generate a user, transfer permissions, and launch a few services under that uid:gid.
One of the things I need to do is pip install -e /foo a python module mounted at /foo. This also installs a default version of /foo contained in the docker image if the user doesn't specify a volume. The reason I am doing this install at runtime is because this container is designed to contain the entire environment for development and experimentation of foo, so if a user mounts a system version of foo, e.g. -v /home/user/foo:/foo, the user can develop by updating host:/home/user/foo or in container:/foo and all changes will persist and the image won't need to be rebuilt to get new changes. It needs to be an editable install so that new changes don't require reinstallation.
I have this working now.
I would like to speed up container initialization by moving this pip install into the image build, and then only install the module at runtime if the user has mounted a new /foo at runtime using -v /home/user/foo:/foo.
Of course, there are other ways to do this. For example, I could build foo into the image copying it to /bar at build time and install foo using pip install /bar... Then at runtime just check if /foo exists and if it doesn't then create a symlink /foo->/bar. If it does exist then pip uninstall foo and pip install -e /foo.. but this isn't the cleanest solution. I could also just mv /bar /foo at runtime if /foo doesn't exist.. but I'm not sure how pip will handle the change in module path.

The only way to do this, that I can think of, is to map the docker socket into the container, so you can do docker inspect from inside the container and see the mounted volumes. Like this
docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock docker
and then inside the container
docker inspect $(hostname)
I've used the 'docker' image, since that has docker installed. You can, of course, use another image. You just have to install docker in it.

Cannot run Google Vision API on AWS Lambda [duplicate]

I am trying to use Google Cloud Platform (specifically, the Vision API) for Python with AWS Lambda. Thus, I have to create a deployment package for my dependencies. However, when I try to create this deployment package, I get several compilation errors, regardless of the version of Python (3.6 or 2.7). Considering the version 3.6, I get the issue "Cannot import name 'cygrpc'". For 2.7, I get some unknown error with the .path file. I am following the AWS Lambda Deployment Package instructions here. They recommend two options, and both do not work / result in the same issue. Is GCP just not compatible with AWS Lambda for some reason? What's the deal?
Neither Python 3.6 nor 2.7 work for me.
NOTE: I am posting this question here to answer it myself because it took me quite a while to find a solution, and I would like to share my solution.

TL;DR: You cannot compile the deployment package on your Mac or whatever pc you use. You have to do it using a specific OS/"setup", the same one that AWS Lambda uses to run your code. To do this, you have to use EC2.
I will provide here an answer on how to get Google Cloud Vision working on AWS Lambda for Python 2.7. This answer is potentially extendable for other other APIs and other programming languages on AWS Lambda.
So the my journey to a solution began with this initial posting on Github with others who have the same issue. One solution someone posted was
I had the same issue " cannot import name 'cygrpc' " while running
the lambda. Solved it with pip install google-cloud-vision in the AMI
amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2 instance and exported the
lib/python3.6/site-packages to aws lambda Thank you #tseaver
This is partially correct, unless I read it wrong, but regardless it led me on the right path. You will have to use EC2. Here are the steps I took:
Set up an EC2 instance by going to EC2 on Amazon. Do a quick read about AWS EC2 if you have not already. Set one up for amzn-ami-hvm-2018.03.0.20180811-x86_64-gp2 or something along those lines (i.e. the most updated one).
Get your EC2 .pem file. Go to your Terminal. cd into your folder where your .pem file is. ssh into your instance using
ssh -i "your-file-name-here.pem" ec2-user#ec2-ip-address-here.compute-1.amazonaws.com
Create the following folders on your instance using mkdir: google-cloud-vision, protobuf, google-api-python-client, httplib2, uritemplate, google-auth-httplib2.
On your EC2 instance, cd into google-cloud-vision. Run the command:
pip install google-cloud-vision -t .
Note If you get "bash: pip: command not found", then enter "sudo easy_install pip" source.
Repeat step 4 with the following packages, while cd'ing into the respective folder: protobuf, google-api-python-client, httplib2, uritemplate, google-auth-httplib2.
Copy each folder on your computer. You can do this using the scp command. Again, in your Terminal, not your EC2 instance and not the Terminal window you used to access your EC2 instance, run the command (below is an example for your "google-cloud-vision" folder, but repeat this with every folder):
sudo scp -r -i your-pem-file-name.pem ec2-user#ec2-ip-address-here.compute-1.amazonaws.com:~/google-cloud-vision ~/Documents/your-local-directory/
Stop your EC2 instance from the AWS console so you don't get overcharged.
For your deployment package, you will need a single folder containing all your modules and your Python scripts. To begin combining all of the modules, create an empty folder titled "modules." Copy and paste all of the contents of the "google-cloud-vision" folder into the "modules" folder. Now place only the folder titled "protobuf" from the "protobuf" (sic) main folder in the "Google" folder of the "modules" folder. Also from the "protobuf" main folder, paste the Protobuf .pth file and the -info folder in the Google folder.
For each module after protobuf, copy and paste in the "modules" folder the folder titled with the module name, the .pth file, and the "-info" folder.
You now have all of your modules properly combined (almost). To finish combination, remove these two files from your "modules" folder: googleapis_common_protos-1.5.3-nspkg.pth and google_cloud_vision-0.34.0-py3.6-nspkg.pth. Copy and paste everything in the "modules" folder into your deployment package folder. Also, if you're using GCP, paste in your .json file for your credentials as well.
Finally, put your Python scripts in this folder, zip the contents (not the folder), upload to S3, and paste the link in your AWS Lambda function and get going!
If something here doesn't work as described, please forgive me and either message me or feel free to edit my answer. Hope this helps.

Building off the answer from #Josh Wolff (thanks a lot, btw!), this can be streamlined a bit by using a Docker image for Lambdas that Amazon makes available.
You can either bundle the libraries with your project source or, as I did below in a Makefile script, upload it as an AWS layer.
layer:
set -e ;\
docker run -v "$(PWD)/src":/var/task "lambci/lambda:build-python3.6" /bin/sh -c "rm -R python; pip install -r requirements.txt -t python/lib/python3.6/site-packages/; exit" ;\
pushd src ;\
zip -r my_lambda_layer.zip python > /dev/null ;\
rm -R python ;\
aws lambda publish-layer-version --layer-name my_lambda_layer --description "Lambda layer" --zip-file fileb://my_lambda_layer.zip --compatible-runtimes "python3.6" ;\
rm my_lambda_layer.zip ;\
popd ;
The above script will:
Pull the Docker image if you don't have it yet (above uses Python 3.6)
Delete the python directory (only useful for running a second
time)
Install all requirements to the python directory, created in your projects /src directory
ZIP the python directory
Upload the AWS layer
Delete the python directory and zip file
Make sure your requirements.txt file includes the modules listed above by Josh: google-cloud-vision, protobuf, google-api-python-client, httplib2, uritemplate, google-auth-httplib2

There's a fast solution that doesn't require much coding.
Cloud9 uses AMI so using pip on their virtual environment should make it work.
I created a Lambda from the Cloud9 UI and from the console activated the venv for the EC2 machine. I proceeded to install google-cloud-speech with pip.That was enough to fix the issue.

I was facing same error using goolge-ads API.
{
"errorMessage": "Unable to import module 'lambda_function': cannot import name'cygrpc' from 'grpc._cython' (/var/task/grpc/_cython/init.py)","errorType": "Runtime.ImportModuleError","stackTrace": []}
My Lambda runtime was Python 3.9 and architecture x86_64.
If somebody encounter similar ImportModuleError then see my answer here : Cannot import name 'cygrpc' from 'grpc._cython' - Google Ads API

Docker Python Image

I've a RHEL host with docker installed, it has default Py 2.7. My python scripts needs a bit more modules which
I can't install due to lack of sudo access & moreover, I dont want to screw up with the default Py which is needed for host to function.
Now, I am trying to get a python in docker container where I get to add few modules do the needfull.
Issue - docker installed RHEL is not connected to internet and cant be connected as well
The laptop i have doesnt have the docker either and I can't install docker here (no admin acccess) to create the docker image and copy them to RHEL host
I was hoping if docker image with python can be downloaded from Internet I might be able to use that as is!,
Any pointers in any approprite direction would be appreciated.
what have I done - tried searching for the python images, been through the dockers documentation to create the image.
Apologies if the above question sounds silly, I am getting better with time on docker :)

If your environment is restricted enough that you can't use sudo to install packages, you won't be able to use Docker: if you can run any docker run command at all you can trivially get unrestricted root access on the host.
My python scripts needs a bit more modules which I can't install due to lack of sudo access & moreover, I dont want to screw up with the default Py which is needed for host to function.
That sounds like a perfect use for a virtual environment: it gives you an isolated local package tree that you can install into as an unprivileged user and doesn't interfere with the system Python. For Python 2 you need a separate tool for it, with a couple of steps to install:
export PYTHONUSERBASE=$HOME
pip install --user virtualenv
~/bin/virtualenv vpy
. vpy/bin/activate
pip install ... # installs into vpy/lib/python2.7/site-packages

you can create a docker image on any standalone machine and push the final required image to docker registry ( docker hub ). Then in your laptop you can pull that image and start working :)
Below are some key commands that will be required for the same.
To create a image, you will need to create a Dockerfile with all the packages installed
Or you can also do sudo docker run -it ubuntu:16.04 then install python and other packages as required.
then sudo docker commit container_id name
sudo docker tag SOURCE_IMAGE[:TAG] TARGET_IMAGE[:TAG]
sudo docker push IMAGE_NAME
Then you pull this image in your laptop and start working.
You can refer to this link for more docker commands https://github.com/akasranjan005/docker-k8s/blob/master/docker/basic-commands.md
Hope this helps. Thanks

Transfer virtualenv to docker image

Is it possible to transfer virtual environment data from a local host to a docker image via the ADD command?
Rather than doing pip installs inside the container, I would rather the user have all of that done locally and simply transfer the virtual environment into the container. Granted all of the files are the same name locally as in the docker container, along with all directories being nested properly.
This would save minutes to hours if it was possible to transfer virtual environment settings into a docker image. Maybe I am thinking about this in the wrong abstract.
It just feels very inefficient doing pip installs via a requirements.txt that was passed into the container, as opposed to doing it all locally, otherwise each time the image is started up it has to re-install the same dependencies that have not changed from each image's build.

We had run into this problem earlier and here are a few things we considered:
Consider building base images that have common packages installed. The app containers can then use the one of these base containers and install the differential.
Cache the Pip packages on a local path that can be mounted on the container. That'll save the time to download the packages.
Depending on the complexity of your project one may suit better than the other - you may also consider a hybrid approach to find maximum optimization.

While possible, it's not recommended.
Dependencies (library versions, globally installed packages) can be different on host machine and container.
Image builds will not be 100% reproducible on other hosts.
Impact of pip install is not big. Each RUN command creates it's own layer, which are cached locally and also in repository, so pip install will be re-run only when requirements.txt is changed (or previous layers are rebuilt).
To trigger pip install only on requirements.txt changes, Dockerfile should start this way:
...
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY src/ ./
...
Also, it will be run only on image build, not container startup.
If you have multiple containers with same dependencies, you can build intermediate image with all the dependencies and build other images FROM it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.