I am trying to use docker to run numerical experiments (eventually on a node like AWS, but let's leave that for now). The code is in python, with some underlying c libraries. The code changes frequently, so the docker image needs to be recreated frequently. Also parameter files change for every experiment I run. I want to use docker to reduce clutter on the machine I run my experiment on.
I don't want to have a docker image per experiment sitting on my hard disk, so I wanted to know if there a way to create, execute, and then delete a docker image in sequence from a python script.
You could use python's subprocess module to call docker commands necessary in linux/ windows depending on where your docker is. Eg for linux,
import subprocess
subprocess.call(["docker", "rmi", "<your-image-name>"])
subprocess.call(["docker", "build", "--tag", "<your-image-name>", "<dir-of-Dockerfile>"])
If your machine is Windows, it may need different arguments, which you can find by Googling.
Related
A bioinformatics protocol was developed and we would like to dockerize it to make its usage easier for others. It consists of 2 softwares and several python scripts to prepare and parse data. Locally, we run these modules on a cluster as several dependent jobs (some requiring high resources) with a wrapper script, like this:
parsing input data (python)
running a few 10-100 jobs on a cluster (each having a piece of the output of step 1). Every step's job depends on the previous one finishing, involving:
a) compiled C++ software on each piece from 1
b) a parsing python script on each piece from 2a
c) an other, resource-intensive compiled C++ software; which uses mpirun to distribute all the output of 2b
finalizing results (python script) on all results from step 2
The dockerized version does not necessarily needs to be organized in the same manner, but at least 2c needs to be distributed with mpirun because users will run it on a cluster.
How could I organize this? Have X different containers in a workflow? Any other possible solution that does not involve multiple containers?
Thank you!
ps. I hope I described it clearly enough but can further clarify if needed
I think in your project it is important to differentiate docker images and docker container.
In a docker image, you will package your code and the dependencies to make it work. The first question is, should all your code be in the same image : you have python scripts, c++ software, so it could make sense to have several images, each capable to run one job of your process.
A docker container is a running instance of a docker image. So if you decided to have several images, you will have several docker containers running during your process. If you decide to have only one image, then you can decide to run everything in one container, by running your wrapper script in the container. Or you could have a new wrapper script instantiating docker containers for each step. This could be interesting as you seem to use different hardware depending on the step.
I can't give specifics about mpirun as I'm not familiar with it
I need to use a docker image which has compiled versions of certain programs that are very hard to compile from scratch.
I need to run a program in that environment.
I installed docker and pulled the image.
But how do I run a program (A Python program that works in the docker environment) using the environment provided to me by docker, sending in the local folder as input and producing the output back to my local system.
So far all resources tutorials show me how to work inside docker and not the problem I want.
Thanks for our help.
You should look at bind mounts. Here’s the Docker documentation of those.
Essentially, that will mount a folder in the host as a folder in the container.
I currently have a handful of small Python scripts on my laptop that are set to run every 1-15 minutes, depending on the script in question. They perform various tasks for me like checking for new data on a certain API, manipulating it, and then posting it to another service, etc.
I have a NAS/personal server (unRAID) and was thinking about moving the scripts to there via Docker, but since I'm relatively new to Docker I wasn't sure about the best approach.
Would it be correct to take something like the Phusion Baseimage which includes Cron, package my scripts and crontab as dependencies to the image, and write the Dockerfile to initialize all of this? Or would it be a more canonical approach to modify the scripts so that they are threaded with recursive timers and just run each script individually in it's own official Python image?
No dude just install python on the docker container/image, move your scripts and run them as normal.
You may have to expose some port or add firewall exception but your container can be as native linux environment.
I am currently new to Docker, I have been through a good amount of online tutorials and still haven't grasped the entire process. I understand that most of the tutorials have you pull from online public repositories.
But for my application I feel I need to create these images and implement them into containers from my local or SSH'd computer. So I guess my overall question is, How can I create an image and implement it into a container from nothing? I want to try it on something such as Python before I move onto my big project I will be doing in the future. My future project will be containerizing a weather research model.
I do not want someone to do it for me, I just have not had any luck searching for documentation, that I understand, that gives me a basis of how to do it without pulling from repositories online. Any links or documentation would be greatly received and appreciated.
What I found confusing about Docker, and important to understand, was the difference between images and containers. Docker uses collections of files with your existing kernal to create systems within your existing system. Containers are collections of files that are updated as you run them. Images are saved copies of files that cannot be manipulated. There's more to it than that, based on what commands you can use on them, but you can learn that as you go.
First you should download an existing image that has the base files for an operating system. You can use docker search to look for one. I wanted a small operating system that was 32 bit. I decided to try Debian, so I used docker search debian32. Once you find an image, use docker pull to get it. I used docker pull hugodby/debian32 to get my base image.
Next you'll want to build a container using the image. You'll want to create a 'Dockerfile' that has all of your commands for creating the image. However, if you're not certain about what you want in the system, you can use the image that you downloaded to create a container, make the changes (while writing down what you've done), and then create your 'Dockerfile' with the commands that perform those tasks afterward.
If you create a 'Dockerfile', you would then move into the directory with the 'Dockerfile' and, to build the image, run the command: docker build -t TAG.
You can now create a container from the image and run it using:
docker run -it --name=CONTAINER_NAME TAG
CONTAINER_NAME is what you want to reference the container as and TAG was the tag from the image that you downloaded or the one that you previously assigned to the image created from the 'Dockerfile'.
Once you're inside the container you can install software and much of what you'd do with a regular Linux system.
Some additional commands that may be useful are:
CTRL-p CTRL-q # Exits the terminal of a running container without stopping it
docker --help # For a list of docker commands and options
docker COMMAND --help # For help with a specific docker command
docker ps -a # For a list of all containers (running or not)
docker ps # For a list of running containers
docker start CONTAINER_NAME # To start a container that isn't running
docker images # For a list of images
docker rmi TAG # To remove an image
I know there are a ton of articles, blogs, and SO questions about running python applications inside a docker container. But I am looking for information on doing the opposite. I have a distributed application with a lot of independent components, and I have put each of these components inside a docker container(s) that I can run by manually via
docker run -d <MY_IMAGE_ID> mycommand.py
But I am trying to find a way (a pythonic, object-oriented way) of running these containerized applications from a python script on my "master" host machine.
I can obviously wrap the command line calls into a subprocess.Popen(), but I'm looking to see if something a bit more managed exists.
I have seen docker-py but I'm not sure if this is what I need or not; I can't find a way to use it to simply run a container, capture output, etc.
If anyone has any experience with this, I would appreciate any pointers.