I some questions about Docker. I have very little knowledge about it, so kindly bear with me.
I have a python script that does something and writes into a PostgreSQL DB. Both are run on Docker. Python uses python:3.8.8-buster and PostgreSQL postgres:13. Using docker-compose up, I am able to instantiate both these services and I see the items inserted in the PostgreSQL table. When I docker-compose down, as usual, the services shut down as expected. Here are the questions I have:
When I run the container of the PostgreSQL service by itself (not using docker-compose up, but docker run then docker exec) then login into db using PSQL, it doesn't take the db name as the db name mentioned in the docker-compose.yml file. It takes localhost, but with the username mentioned per the docker-compose.yml file. It also doesn't ask me for the password, although it's mentioned in the Dockerfile itself(not docker-compose.yml - for each of the services, I have a Dockerfile that I build in the docker-compose.yml). Is that expected? If so, why?
After I've logged in, when I SELECT * FROM DB_NAME; it displays 0 records. So, basically it doesn't display the records written in the DB in the previous run. Why's that? How can I see the contents of the DB when it's not up? When the container is running (when I docker-compose up), I know I can see the records from PG Admin (which BTW is also a part of my docker-compose.yml file, and I have it only to make it easier to see if the records have been written into the DB).
So after my script runs, and it writes into the db, it stops. Is there a way to restart with without docker-compose down then docker-compose up? (On VSCode) when I simply run the script, while still docker-compose is up it says it cannot find the db (that's mentioned in the docker-compose.yml file). So I have to go back and change the db name in the script to point localhost - This circles back to the question #1.
I am new to docker, and I am trying my best to wrap my head around all this.
This behavior depends on your specific setup. I will have to see the Dockerfile(s) and docker-compose.yaml in order to give a suitable answer.
This is probably caused by mounting an anonymous volume to your postgres service instead of a named volume. Anonymous volumes are not automatically mounted when executing docker-compose up. Named volumes are.
Here's a docker-compose.yaml example of how to mount a named volume called database:
version: '3.8'
# Defining the named volume
volumes:
database:
services:
database:
image: 'postgres:latest'
restart: 'always'
environment:
POSTGRES_USER: 'admin'
POSTGRES_PASSWORD: 'admin'
POSTGRES_DB: 'app'
volumes:
# Mounting the named volume
- 'database:/var/lib/postgresql/data/'
ports:
- '5432:5432'
I assume this depends more on the contents of your script than on the way you configured your docker postgres service. Postgres does not shut down after simply writing data to it. But again, I will have to see the Dockerfiles(s) and docker-compose.yaml (and the script) in order to provide a more suitable answer.
If you docker run an image, it always creates a new container, and it never looks at the docker-compose.yml. If you for example
docker run --name postgres postgres
docker exec -it postgres ...
that starts a new container based on the postgres:latest image, with no particular storage or networking setup. That's why you can't use the Compose host name of the container or see any of the data that your Compose setup would normally have.
You can use docker-compose up to start a specific service and its dependencies, though:
docker-compose up -d postgres
Once you do this, you can use ordinary tools like psql to connect to the database through its published ports:
psql -h localhost -p 5433 my_db
You should not need normally debugging tools like docker exec; if you do, there is a Compose variant that knows about the Compose service names
# For debugging use only -- not the primary way to interact with the DB
docker-compose exec postgres psql my_db
After my script runs, and it writes into the db, it stops. Is there a way to restart it?
Several options:
Make your script not stop, in whatever way. Frequently a Docker container will have something like an HTTP service that can accept requests and act on them.
Re-running docker-compose up -d (without explicitly down first) will restart anything that's stopped or anything whose Compose configuration has changed.
You can run a one-off script directly on the host, with configuration pointing at your database's published ports:.
It's relevant here that "in the Compose environment" and "directly on a developer system" are different environments, and you will need a mechanism like environment variables to communicate these. In the Compose environment you will need the database container's name and the default PostgreSQL port 5432; on a developer system you will typically need localhost as the host name and whichever port has been published. You cannot hard-code this configuration in your application.
Related
I have a script setup for a lambda that uses Python and will eventually connect to a DynamoDB table. I setup everything locally (a virtual environment using pipenv) using the docker image AWS provides for DynamoDB and it all worked without a hitch. Then I tried to dockerize the Python. When I run my table creation script in my local virtual environment, it runs without a problem. When I run the same script from within my docker container, I get an error. I'm not sure what the difference is. Right now, I use this line to connect:
dynamodb = boto3.resource('dynamodb', endpoint_url="http://dynamodb:8000")
dynamodb is the name of the service in docker, so this should work. When I replace dynamodb:8000 with localhost:8000, it works fine in my local venv. When I run it in docker, I get
botocore.exceptions.NoRegionError: You must specify a region.
The big question is why it's looking for a region in docker, but not locally. Here's my docker-compose for good measure:
version: '3'
services:
dynamodb:
command: "-jar DynamoDBLocal.jar -sharedDb -optimizeDbBeforeStartup -dbPath ./data"
image: "amazon/dynamodb-local:latest"
container_name: dynamodb
ports:
- "8000:8000"
volumes:
- "./database_data:/home/dynamodblocal/data"
working_dir: /home/dynamodblocal
lambda:
build: .
container_name: user-rekognition-lambda
volumes:
- ./:/usr/src/app
In one of the AWS blogs, local AWS Glue, they share the ~/.aws/ in read-only mode with the docker container using volume option:
-v ~/.aws:/root/.aws:ro
This would be the easiest way for you to reuse your credentials from host workstation inside the docker.
boto3 has a bug that even if you explicitly give it an endpoint_url, it still wants to know a region_name. Why? I don't know, and it doesn't seem to use this name for anything as far as I can tell.
In most people's setups, $HOME/.aws/config contains some default choice of region, so boto3 picks up the region_name from that configuration, and then just ignores it...
Since probably your docker image doesn't have that file, the trivial solution is just to add region_name='us-east-1' (for example) explicitly to your boto3.resource() call. Again, the specific region name you choose won't matter - boto3 will connect to the URL you give it, not to that region.
So the full command becomes:
dynamodb = boto3.resource('dynamodb',
endpoint_url="http://dynamodb:8000",
region_name="us-east-1")
I am trying to set up this Bullet Train API server on our machines. I am successfully able to run their Python server using the docker-compose up method.
As per the docs, it needs the database as well. I preferred using the docker image for the Postgres DB docker run --name local_postgres -d -P postgres which returns this:
It doesn't return a success message saying if the Postgres Docker is running successfully or not. It just returns some kinda long string which I feel should be an identifier of the Postgres Docker image.
As I need to connect this Bullet Train API server to this Dockerized database -
The question is how to find the connection string for this Postgres Docker image?
The trick is to use docker-compose. Put your application image in there as one service and your postgres image as a second service. Then also include an overlay network in the stack and specify that in each of your services. After that it is possible for the application to access the database via the docker service's name "local_postgres" as the hostname.
Update as per your comment
Make sure that your dockerfile that defines the postgres container contains an EXPOSE command.
EXPOSE 5432
If missing, add it and rebuild the container.
Start the container and include the below option, which will expose the database port on localhost.
docker run --name local_postgres -p 5432:5432 -d -P postgres
Check if the port is really exposed by typing
docker ps | grep 'local_postgres'
You should see something like this in the output.
PORTS 0.0.0.0:5432->5432/tcp
If you see this output, the port 5432 is successfully exposed on your host. So if your app runs on localhost, you can access the database via localhost:5432
I am using pg_cron to schedule a task which should be repeated every 1 hour.
I have installed and using this inside a docker environment inside the postgres container.
And I am calling the query to create this job using python from a different container.
I can see that job is created successfully but is not being executed due to lack of permission since the pg_hba.conf is not set to trust or due to no .pgpass file.
But if I enable any of those both, anyone can enter into database by using docker exec and do psql in the container.
Is there anyway to avoid this security issue??? Since in production environment it should not be allowed for anyone to enter into the database without a password.
Either keep people from running docker exec on the container or use something else than pg_cron.
I would feel nervous if random people were allowed to run docker exec on the container with my database or my job scheduler in it.
After installing Docker and googling for hours now, I can't figure out how to place data in a Docker, it seems to become more complex by the minute.
What I did; installed Docker and ran the image that I want to use (kaggle/python). I also read several tutorials about managing and sharing data in Docker containers, but no success so far...
What I want: for now, I simply want to be able to download GitHub repositories+other data to a Docker container. Where and how do I need to store these files? I prefer using GUI or even my GitHub GUI, but simple commands would also be fine I suppose.. Is it also possible to place data or access data from a Docker that is currently not active?
Note that I also assume you are using linux containers. This works in all platforms, but on windows you need to tell your docker process that that you are dealing with linux containers. (It's a dropdown in the tray)
It takes a bit of work to understand docker and the only way to understand it is to get your hands dirty. I recommend starting with making an image of an existing project. Make a Dockerfile and play with docker build . etc.
To cover the docker basics (fast version) first.
In order to run something in docker we first need to build and image
An image is a collection of files
You can add files to an image by making a Dockerfile
Using the FROM keyword on the first line you extend and image
by adding new files to it creating a new image
When staring a container we need to tell what image it should use
and all the files in the image is copied into the containers storage
The simplest way to get files inside a container:
Crate your own image using a Dockerfile and copy in the files
Map a directory on your computer/server into the container
You can also use docker cp, to copy files from and two a container,
but that's not very practical in the long run.
(docker-compose automates a lot of these things for you, but you should probably also play around with the docker command to understand how things work. A compose file is basically a format that stores arguments to the docker command so you don't have to write commands that are multiple lines long)
A "simple" way to configure multiple projects in docker in local development.
In your project directory, add a docker-dev folder (or whatever you want to call it) that contains an environment file and a compose file. The compose file is responsible for telling docker how it should run your projects. You can of course make a compose file for each project, but this way you can run them easily together.
projects/
docker-dev/
.env
docker-compose.yml
project_a/
Dockerfile
# .. all your project files
project_b/
Dockerfile
# .. all your project files
The values in .env is sent as variables to the compose file. We simply add the full path to the project directory for now.
PROJECT_ROOT=/path/to/your/project/dir
The compose file will describe each of your project as a "service". We are using compose version 2 here.
version: '2'
services:
project_a:
# Assuming this is a Django project and we override command
build: ${PROJECT_ROOT}/project_a
command: python manage.py runserver 0.0.0.0:8000
volumes:
# Map the local source inside the container
- ${PROJECT_ROOT}/project_a:/srv/project_a/
ports:
# Map port 8000 in the container to your computer at port 8000
- "8000:8000"
project_a:
# Assuming this is a Django project and we override command
build: ${PROJECT_ROOT}/project_b
volumes:
# Map the local source inside the container
- ${PROJECT_ROOT}/project_b:/srv/project_b/
This will tell docker how to build and run the two projects. We are also mapping the source on your computer into the container so you can work on the project locally and see instant updates in the container.
Now we need to create a Dockerfile for each out our projects, or docker will not know how to build the image for the project.
Example of a Dockerfile:
FROM python:3.6
COPY requirements.txt /requirements.txt
RUN pip install requirements.txt
# Copy the project into the image
# We don't need that now because we are mapping it from the host
# COPY . /srv/project_a
# If we need to expose a network port, make sure we specify that
EXPOSE 8000
# Set the current working directory
WORKDIR /srv/project_a
# Assuming we run django here
CMD python manage.py runserver 0.0.0.0:8000
Now we enter the docker-dev directory and try things out. Try to build a single project at a time.
docker-compose build project_a
docker-compose build project_b
To start the project in background mode.
docker-compose up -d project_a
Jumping inside a running container
docker-compose exec project_a bash
Just run the container in the forground:
docker-compose run project_a
There is a lot of ground to cover, but hopefully this can be useful.
In my case I run a ton of web servers of different kinds. This gets really frustrating if you don't set up a proxy in docker so you can reach each container using a virtual host. You can for example use jwilder-nginx (https://hub.docker.com/r/jwilder/nginx-proxy/) to solve this in a super-easy way. You can edit your own host file and make fake name entires for each container (just add a .dev suffix so you don't override real dns names)
The jwilder-nginx container will automagically send you to a specific container based on a virtualhost name you decide. Then you no longer need to map ports to your local computer except for the nginx container that maps to port 80.
For others who prefer using GUI, I ended up using portainer.
After installing portainer (which is done by using one simple command), you can open the UI by browsing to where it is running, in my case:
http://127.0.1.1:9000
There you can create a container. First specify a name and an image, then scroll down to 'Advanced container options' > Volumes > map additional volume. Click the 'Bind' button, specify a path in the container (e.g. '/home') and the path on your host, and you're done!
Add files to this host directory while your container is not running, then start the container and you will see your files in there. The other way around, accessing in files created by the container, is also possible while the container is not running.
Note: I'm not sure whether this is the correct way of doing things. I will, however, edit this post as soon as I encounter any problems.
After pulling the image, you can use code like this in the shell:
docker run --rm -it -p 8888:8888 -v d:/Kaggles:/d kaggle/python
Run jupyter notebook inside the container
jupyter notebook --ip=0.0.0.0 --no-browser
This mounts the local directory onto the container having access to it.
Then, go to the browser and hit https://localhost:8888, and when I open a new kernel it's with Python 3.5/ I don't recall doing anything special when pulling the image or setting up Docker.
You can find more information from here.
You can also try using datmo in order to easily setup environment and track machine learning projects to make experiments reproducible. You can run datmo task command as follows for setting up jupyter notebook,
datmo task run 'jupyter notebook' --port 8888
It sets up your project and files inside the environment to keep track of your progress.
I am trying to backup a running postgres container, without luck. What I have so far is a data only container, with - /var/lib/postgresql/data volume, and a main database container running official postgres images, with --volumes-from data only container.
I tried backing up /data directory of the running postgres container, and then restoring it, however then postgres run into WAL mismatch problem (also this won't work regardless according to the postgres docs).
So, any way I could backup the database and then restore it, while the main database container is running?