How to create docker container with Python and Orange - python

Does anyone know how I create a docker container with Python and Orange, without installing the whole Anaconda package.
I managed to make it work with a container of 8.0 GB, but that is way too big

From the GitHub project page, look at the README, and download the appropriate requirements-* files. Create a directory containing the file(s), and write a Dockerfile like this:
FROM python:3.7
RUN pip install PyQt5
COPY requirements-core.txt /tmp
RUN pip install -r requirements-core.txt
# repeat the previous two commands with other files, if needed
pip install git+https://github.com/biolab/orange3
Add any other commands as needed, e.g. to COPY your source code.

Related

How to install package dependencies for a custom Airbyte connector?

I'm developing a custom connector for Airbyte, and it involves extracting files from different compressed formats, like .zip or .7z. My plan was to use patool for this, and indeed it works in local tests, running:
python main.py read --config valid_config.json --catalog configured_catalog_old.json
However, since Airbyte runs in docker containers, I need those containers to have packages like p7zip installed. So my question is, what is the proper way to do that?
I just downloaded and deployed Airbyte Open Source in my own machine using the recommended commands listed on Airbyte documentation:
git clone https://github.com/airbytehq/airbyte.git
cd airbyte
docker compose up
I tried using docker exec -it CONTAINER_ID bash into airbyte/worker and airbyte/connector-builder-server, to install p7zip directly, but it's not working yet. My connector calls patoolib from a Python script, but it is unable to process the given file, because it fails to find a program to extract it. This is the log output:
> patool: Extracting /tmp/tmpan2mjkmn ...
> unknown archive format for file `/tmp/tmpan2mjkmn'
It turns out I completely ignored that the connector template comes with a Dockerfile, which is used precisely to configure the container that is supposed to run the connector code. So all I had to do was to add this line do Dockerfile:
RUN apt-get update && apt-get install -y file p7zip p7zip-full lzma lzma-dev
Specifically, to use patoolib, I had to install the file package, so it could detect the mime type of archive files.

How to install additional dependencies in Tensorman

I am on popos 20.04 LTS and I want to use Tensorman for tenserflow/python. I'm new into docker and I want to install additional dependencies for example using default Image I can run jupyter notebook using these commands -
tensorman run -p 8888:8888 --gpu --python3 --jupyter bash
jupyter notebook --ip=0.0.0.0 --no-browser
but now I have to install additional dependencies for example if I want to install jupytertheme how can I change that ? I have tried to install it directly inside docker container but its not working that way.
this issue is looking similar to my problem but there was no explanation exactly how I have to make custom image in tensorman.
There are two ways to install dependencies.
Create a custom image, install dependencies and save it.
Use the --root tag to gain root access to the container, install dependencies and use them.
Build your own custom image
If you are working on a project and want some dependencies for that project or just want to save all your favourite dependencies, you can create a custom image according to that project and save it and later use that image for your project.
Now make a list of all packages once you are ready use this command
tensorman run -p 8888:8888 --root --python3 --gpu --jupyter --name CONTAINER_NAME bash
Where CONTAINER_NAME is the name of the container you can give any name you want and -p sets the port (you can search about port forwarding in docker)
Now you are running container as root, now in the container shell use.
# its always a good idea to update the apt, you can install packages from apt also
apt update
# install jupyterthemes
pip install jupyterthemes
# check if all your desired packages are installed
pip list
Now it's time to save your image
Open a new terminal and use this command to save your image
tensorman save CONTAINER_NAME IMAGE_NAME
CONTAINER_NAME should be the one which was used earlier, and for IMAGE_NAME you can choose according to your preferences.
Now you can close the terminals use tensorman list to check your custom image is there or not. To use your custom image use
tensorman =IMAGE_NAME run -p 8888:8888 --gpu bash
# to use jupyter
jupyter notebook --ip=0.0.0.0 --no-browser
Use --root and install dependencies
Now you might be wondering that in a normal Jupyter notebook, you can install dependencies even inside the notebook, but that's not the case with tensorman; it's because we're not running it as root because if we run it as root, exported files in host machine would also be using root permissions that's why it is good to avoid using --root tag but we can use that to install dependencies. After installing, you have to save that image (it's not necessary though you can also install them every time) otherwise, installed dependencies will be lost.
In the last step of the custom image building, use these commands instead
# notice --root
tensorman =IMAGE_NAME run -p 8888:8888 --gpu --root bash
# to use jupyter, notice --allow-root
jupyter notebook --allow-root --ip=0.0.0.0 --no-browser

How can I generate a requirements.txt file for a package not available on my development platform?

I'm trying to generate requirements/dev.txt and prod.txt files for my python project. I'm using pip-compile-multi to generate them from base.in dev.in and prod.in files. Everything works great until I add tensorflow-gpu==2.0.0a0 into the prod.in file. I get this error when I do: RuntimeError: Failed to pip-compile requirements/prod.in.
I believe this is because tensorflow-gpu is only available on Linux, and my dev machine is a Mac. (If I run pip install tensorflow-gpu==2.0.0a0 I am told there is no distribution for my platform.) Is there a way I can generate a requirements.txt file for pip for a package that is not available on my platform? To be clear, my goal is to generate a requirements.txt file using something like pip-compile-multi (because that will version dependencies) that will only install on Linux, but I want to be able to actually generate the file on any platform.
Use environment markers from PEP 496:
tensorflow-gpu==2.0.0a0; sys_platform!='darwin'
You could run pip-compile-multi in a Docker container. That way you'd be running it under Linux, and you could do that on your Mac or other dev machines. As a one-liner, it might look something like this:
docker run --rm --mount type=bind,src=$(pwd),dst=/code -w /code python:3.8 bash -c "pip install pip-compile-multi && pip-compile-multi"
I haven't used pip-compile-multi, so I'm not exactly sure how you call it. Maybe you'd need to add some arguments to that command. Depending on how complicated your setup is, you could consider writing a Dockerfile and simplifying the one-liner a bit.
Currently pip-tools doesn't support this, there's an open issue on github. A workaround suggested by the author of pip-compile-multi is to generate a linux.txt on a linux machine, and then statically link that to a non-generated linux-prod.txt like this
-r prod.txt
-r linux.txt
I think you are looking for tensorflow-gpu==2.0.0a0 (remove the - before the a). I think this is the version you are looking for: https://pypi.org/project/tensorflow-gpu/2.0.0a0/
See the pip install command on the page. Hope this helps.

Prevent Docker from installing python package requirements on every build (without requirements.txt)

I'm building an image from a Dockerfile where my main program is a python application that has a number of dependencies. The application is installed via setup.py and the dependencies are listed inside. There is no requirements.txt. I'm wondering if there is a way to avoid having to download and build all of the application dependencies, which rarely change, on every image build. I saw a number of solutions that use the requirements.txt file but I'd like to avoid having one if possible.
You can use requires.txt from the egg info to preinstall the requirements.
WORKDIR path/to/setup/script
RUN python setup.py egg_info
RUN pip install -r pkgname.egg-info/requires.txt
One solution is: if those dependencies rarely change as you said what you could do is another image with already those packages installed. You would create that image and then save it using docker save, so you end with a new base image with the required dependencies. docker save will create a .tar with the image. You have to load the image using docker load and then in your Dockerfile you would do:
FROM <new image with all the dependencies>
//your stuff and no need to run pip install
....
Hope it helps
Docker save description

Transfer virtualenv to docker image

Is it possible to transfer virtual environment data from a local host to a docker image via the ADD command?
Rather than doing pip installs inside the container, I would rather the user have all of that done locally and simply transfer the virtual environment into the container. Granted all of the files are the same name locally as in the docker container, along with all directories being nested properly.
This would save minutes to hours if it was possible to transfer virtual environment settings into a docker image. Maybe I am thinking about this in the wrong abstract.
It just feels very inefficient doing pip installs via a requirements.txt that was passed into the container, as opposed to doing it all locally, otherwise each time the image is started up it has to re-install the same dependencies that have not changed from each image's build.
We had run into this problem earlier and here are a few things we considered:
Consider building base images that have common packages installed. The app containers can then use the one of these base containers and install the differential.
Cache the Pip packages on a local path that can be mounted on the container. That'll save the time to download the packages.
Depending on the complexity of your project one may suit better than the other - you may also consider a hybrid approach to find maximum optimization.
While possible, it's not recommended.
Dependencies (library versions, globally installed packages) can be different on host machine and container.
Image builds will not be 100% reproducible on other hosts.
Impact of pip install is not big. Each RUN command creates it's own layer, which are cached locally and also in repository, so pip install will be re-run only when requirements.txt is changed (or previous layers are rebuilt).
To trigger pip install only on requirements.txt changes, Dockerfile should start this way:
...
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY src/ ./
...
Also, it will be run only on image build, not container startup.
If you have multiple containers with same dependencies, you can build intermediate image with all the dependencies and build other images FROM it.

Categories