Boto3 requiring AWS region in docker but not in local env - python

I have a script setup for a lambda that uses Python and will eventually connect to a DynamoDB table. I setup everything locally (a virtual environment using pipenv) using the docker image AWS provides for DynamoDB and it all worked without a hitch. Then I tried to dockerize the Python. When I run my table creation script in my local virtual environment, it runs without a problem. When I run the same script from within my docker container, I get an error. I'm not sure what the difference is. Right now, I use this line to connect:
dynamodb = boto3.resource('dynamodb', endpoint_url="http://dynamodb:8000")
dynamodb is the name of the service in docker, so this should work. When I replace dynamodb:8000 with localhost:8000, it works fine in my local venv. When I run it in docker, I get
botocore.exceptions.NoRegionError: You must specify a region.
The big question is why it's looking for a region in docker, but not locally. Here's my docker-compose for good measure:
version: '3'
services:
dynamodb:
command: "-jar DynamoDBLocal.jar -sharedDb -optimizeDbBeforeStartup -dbPath ./data"
image: "amazon/dynamodb-local:latest"
container_name: dynamodb
ports:
- "8000:8000"
volumes:
- "./database_data:/home/dynamodblocal/data"
working_dir: /home/dynamodblocal
lambda:
build: .
container_name: user-rekognition-lambda
volumes:
- ./:/usr/src/app

In one of the AWS blogs, local AWS Glue, they share the ~/.aws/ in read-only mode with the docker container using volume option:
-v ~/.aws:/root/.aws:ro
This would be the easiest way for you to reuse your credentials from host workstation inside the docker.

boto3 has a bug that even if you explicitly give it an endpoint_url, it still wants to know a region_name. Why? I don't know, and it doesn't seem to use this name for anything as far as I can tell.
In most people's setups, $HOME/.aws/config contains some default choice of region, so boto3 picks up the region_name from that configuration, and then just ignores it...
Since probably your docker image doesn't have that file, the trivial solution is just to add region_name='us-east-1' (for example) explicitly to your boto3.resource() call. Again, the specific region name you choose won't matter - boto3 will connect to the URL you give it, not to that region.
So the full command becomes:
dynamodb = boto3.resource('dynamodb',
endpoint_url="http://dynamodb:8000",
region_name="us-east-1")

Related

Docker with Python and PostgreSQL

I some questions about Docker. I have very little knowledge about it, so kindly bear with me.
I have a python script that does something and writes into a PostgreSQL DB. Both are run on Docker. Python uses python:3.8.8-buster and PostgreSQL postgres:13. Using docker-compose up, I am able to instantiate both these services and I see the items inserted in the PostgreSQL table. When I docker-compose down, as usual, the services shut down as expected. Here are the questions I have:
When I run the container of the PostgreSQL service by itself (not using docker-compose up, but docker run then docker exec) then login into db using PSQL, it doesn't take the db name as the db name mentioned in the docker-compose.yml file. It takes localhost, but with the username mentioned per the docker-compose.yml file. It also doesn't ask me for the password, although it's mentioned in the Dockerfile itself(not docker-compose.yml - for each of the services, I have a Dockerfile that I build in the docker-compose.yml). Is that expected? If so, why?
After I've logged in, when I SELECT * FROM DB_NAME; it displays 0 records. So, basically it doesn't display the records written in the DB in the previous run. Why's that? How can I see the contents of the DB when it's not up? When the container is running (when I docker-compose up), I know I can see the records from PG Admin (which BTW is also a part of my docker-compose.yml file, and I have it only to make it easier to see if the records have been written into the DB).
So after my script runs, and it writes into the db, it stops. Is there a way to restart with without docker-compose down then docker-compose up? (On VSCode) when I simply run the script, while still docker-compose is up it says it cannot find the db (that's mentioned in the docker-compose.yml file). So I have to go back and change the db name in the script to point localhost - This circles back to the question #1.
I am new to docker, and I am trying my best to wrap my head around all this.
This behavior depends on your specific setup. I will have to see the Dockerfile(s) and docker-compose.yaml in order to give a suitable answer.
This is probably caused by mounting an anonymous volume to your postgres service instead of a named volume. Anonymous volumes are not automatically mounted when executing docker-compose up. Named volumes are.
Here's a docker-compose.yaml example of how to mount a named volume called database:
version: '3.8'
# Defining the named volume
volumes:
database:
services:
database:
image: 'postgres:latest'
restart: 'always'
environment:
POSTGRES_USER: 'admin'
POSTGRES_PASSWORD: 'admin'
POSTGRES_DB: 'app'
volumes:
# Mounting the named volume
- 'database:/var/lib/postgresql/data/'
ports:
- '5432:5432'
I assume this depends more on the contents of your script than on the way you configured your docker postgres service. Postgres does not shut down after simply writing data to it. But again, I will have to see the Dockerfiles(s) and docker-compose.yaml (and the script) in order to provide a more suitable answer.
If you docker run an image, it always creates a new container, and it never looks at the docker-compose.yml. If you for example
docker run --name postgres postgres
docker exec -it postgres ...
that starts a new container based on the postgres:latest image, with no particular storage or networking setup. That's why you can't use the Compose host name of the container or see any of the data that your Compose setup would normally have.
You can use docker-compose up to start a specific service and its dependencies, though:
docker-compose up -d postgres
Once you do this, you can use ordinary tools like psql to connect to the database through its published ports:
psql -h localhost -p 5433 my_db
You should not need normally debugging tools like docker exec; if you do, there is a Compose variant that knows about the Compose service names
# For debugging use only -- not the primary way to interact with the DB
docker-compose exec postgres psql my_db
After my script runs, and it writes into the db, it stops. Is there a way to restart it?
Several options:
Make your script not stop, in whatever way. Frequently a Docker container will have something like an HTTP service that can accept requests and act on them.
Re-running docker-compose up -d (without explicitly down first) will restart anything that's stopped or anything whose Compose configuration has changed.
You can run a one-off script directly on the host, with configuration pointing at your database's published ports:.
It's relevant here that "in the Compose environment" and "directly on a developer system" are different environments, and you will need a mechanism like environment variables to communicate these. In the Compose environment you will need the database container's name and the default PostgreSQL port 5432; on a developer system you will typically need localhost as the host name and whichever port has been published. You cannot hard-code this configuration in your application.

Value of Airflow variable gets invalid on restarting docker container

There is a long list of dags and associated airflow variables at some remote instance of airflow, copy of which is running in my local system. All the variables from remote airflow instance are imported to my local airflow instance.
I have installed airflow image on top of docker and thereafter started the container. Everything works fine and I can access the airflow UI from my local system.
Problem:
Whenever I restart the airflow container, all the variables that were imported during the previous container run get invalid like this.
Work Around
Import the variables again to fix the variable related error.
However, Its really frustrating to import variables every time container starts. There must be an intelligent way of achieving this. Please help me understand what am I doing wrong.
New encryption key is generated when the docker container is restarted.
To ensure that the same encryption key is used you will have to either hardcode a FERNET_KEY in the config file or pass an env variable when the container is initially run.
docker run -it -p 8888:8080 -v D:\dev\Dags:/usr/local/airflow/dags -e FERNET_KEY=81HqDtbqAywKSOumSha3BhWNOdQ26slT6K0YaZeZyPs= --name my_airflow_dags airflow_image
The Fernet key here can be anything. Once this key is provided, docker can reuse the same every time container is restarted.
The Root Cause of this Problem is the AirFlow Encryption Mechanism for the Key-value Variables.
when you import your variables manually, their is_encrypted attributes, are automatically set to True.
Whenever, you restart the container, new Encryption Key is generated, thus the old ones got Invalid.
You Have 3 Options :
Set the fernet_key explicitly in airflow.cfg
Set the AIRFLOW__CORE__FERNET_KEY Environment Variable in docker-compose.yml
Set the is_encrypted attributes to False(Admin UI, CLI, update sql query , ...)
I personally chose the second one, so my docker-compose.yml file, looks like this :
environment:
- LOAD_EX=n
- EXECUTOR=Local
- AIRFLOW__CORE__FERNET_KEY='81HqDtbqAywKSOumSha3BhWNOdQ26slT6K0YaZeZyPs='
thanks to wittfabian

How to connect a Postgres Docker image from another Docker container

I am trying to set up this Bullet Train API server on our machines. I am successfully able to run their Python server using the docker-compose up method.
As per the docs, it needs the database as well. I preferred using the docker image for the Postgres DB docker run --name local_postgres -d -P postgres which returns this:
It doesn't return a success message saying if the Postgres Docker is running successfully or not. It just returns some kinda long string which I feel should be an identifier of the Postgres Docker image.
As I need to connect this Bullet Train API server to this Dockerized database -
The question is how to find the connection string for this Postgres Docker image?
The trick is to use docker-compose. Put your application image in there as one service and your postgres image as a second service. Then also include an overlay network in the stack and specify that in each of your services. After that it is possible for the application to access the database via the docker service's name "local_postgres" as the hostname.
Update as per your comment
Make sure that your dockerfile that defines the postgres container contains an EXPOSE command.
EXPOSE 5432
If missing, add it and rebuild the container.
Start the container and include the below option, which will expose the database port on localhost.
docker run --name local_postgres -p 5432:5432 -d -P postgres
Check if the port is really exposed by typing
docker ps | grep 'local_postgres'
You should see something like this in the output.
PORTS 0.0.0.0:5432->5432/tcp
If you see this output, the port 5432 is successfully exposed on your host. So if your app runs on localhost, you can access the database via localhost:5432

How to use docker python SDK in swarm context?

In order to use command like :
client.nodes
We need to run python code on a machine which is a manager inside a swarm. But how i'm suppose to launch the python program ?
There is nothing to install python on the docker machine, and i don't think it's a good idea to try to proceed like that.
And if you launch python in a container you're not in a swarm context.
The only way i found was to launch the python program in Docker Quickstart Terminal of Windows and make the 'default' machine manager in the Swarm.
But now i need to do it on an Ubuntu, so i can't use this solution.
(if there is an equivalent of Docker Quickstart Terminal, i'm interested in)
I finally found the solution using the socket of docker daemon from one of manager node .
Inside your docker-compose, create a service for your Python and add the following volume :
volumes:
- /var/run/docker.sock:/var/run/docker.sock
Don't forget to add a constraint in order to make your service run only on manager node.
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.role == manager

How to use data in a docker container?

After installing Docker and googling for hours now, I can't figure out how to place data in a Docker, it seems to become more complex by the minute.
What I did; installed Docker and ran the image that I want to use (kaggle/python). I also read several tutorials about managing and sharing data in Docker containers, but no success so far...
What I want: for now, I simply want to be able to download GitHub repositories+other data to a Docker container. Where and how do I need to store these files? I prefer using GUI or even my GitHub GUI, but simple commands would also be fine I suppose.. Is it also possible to place data or access data from a Docker that is currently not active?
Note that I also assume you are using linux containers. This works in all platforms, but on windows you need to tell your docker process that that you are dealing with linux containers. (It's a dropdown in the tray)
It takes a bit of work to understand docker and the only way to understand it is to get your hands dirty. I recommend starting with making an image of an existing project. Make a Dockerfile and play with docker build . etc.
To cover the docker basics (fast version) first.
In order to run something in docker we first need to build and image
An image is a collection of files
You can add files to an image by making a Dockerfile
Using the FROM keyword on the first line you extend and image
by adding new files to it creating a new image
When staring a container we need to tell what image it should use
and all the files in the image is copied into the containers storage
The simplest way to get files inside a container:
Crate your own image using a Dockerfile and copy in the files
Map a directory on your computer/server into the container
You can also use docker cp, to copy files from and two a container,
but that's not very practical in the long run.
(docker-compose automates a lot of these things for you, but you should probably also play around with the docker command to understand how things work. A compose file is basically a format that stores arguments to the docker command so you don't have to write commands that are multiple lines long)
A "simple" way to configure multiple projects in docker in local development.
In your project directory, add a docker-dev folder (or whatever you want to call it) that contains an environment file and a compose file. The compose file is responsible for telling docker how it should run your projects. You can of course make a compose file for each project, but this way you can run them easily together.
projects/
docker-dev/
.env
docker-compose.yml
project_a/
Dockerfile
# .. all your project files
project_b/
Dockerfile
# .. all your project files
The values in .env is sent as variables to the compose file. We simply add the full path to the project directory for now.
PROJECT_ROOT=/path/to/your/project/dir
The compose file will describe each of your project as a "service". We are using compose version 2 here.
version: '2'
services:
project_a:
# Assuming this is a Django project and we override command
build: ${PROJECT_ROOT}/project_a
command: python manage.py runserver 0.0.0.0:8000
volumes:
# Map the local source inside the container
- ${PROJECT_ROOT}/project_a:/srv/project_a/
ports:
# Map port 8000 in the container to your computer at port 8000
- "8000:8000"
project_a:
# Assuming this is a Django project and we override command
build: ${PROJECT_ROOT}/project_b
volumes:
# Map the local source inside the container
- ${PROJECT_ROOT}/project_b:/srv/project_b/
This will tell docker how to build and run the two projects. We are also mapping the source on your computer into the container so you can work on the project locally and see instant updates in the container.
Now we need to create a Dockerfile for each out our projects, or docker will not know how to build the image for the project.
Example of a Dockerfile:
FROM python:3.6
COPY requirements.txt /requirements.txt
RUN pip install requirements.txt
# Copy the project into the image
# We don't need that now because we are mapping it from the host
# COPY . /srv/project_a
# If we need to expose a network port, make sure we specify that
EXPOSE 8000
# Set the current working directory
WORKDIR /srv/project_a
# Assuming we run django here
CMD python manage.py runserver 0.0.0.0:8000
Now we enter the docker-dev directory and try things out. Try to build a single project at a time.
docker-compose build project_a
docker-compose build project_b
To start the project in background mode.
docker-compose up -d project_a
Jumping inside a running container
docker-compose exec project_a bash
Just run the container in the forground:
docker-compose run project_a
There is a lot of ground to cover, but hopefully this can be useful.
In my case I run a ton of web servers of different kinds. This gets really frustrating if you don't set up a proxy in docker so you can reach each container using a virtual host. You can for example use jwilder-nginx (https://hub.docker.com/r/jwilder/nginx-proxy/) to solve this in a super-easy way. You can edit your own host file and make fake name entires for each container (just add a .dev suffix so you don't override real dns names)
The jwilder-nginx container will automagically send you to a specific container based on a virtualhost name you decide. Then you no longer need to map ports to your local computer except for the nginx container that maps to port 80.
For others who prefer using GUI, I ended up using portainer.
After installing portainer (which is done by using one simple command), you can open the UI by browsing to where it is running, in my case:
http://127.0.1.1:9000
There you can create a container. First specify a name and an image, then scroll down to 'Advanced container options' > Volumes > map additional volume. Click the 'Bind' button, specify a path in the container (e.g. '/home') and the path on your host, and you're done!
Add files to this host directory while your container is not running, then start the container and you will see your files in there. The other way around, accessing in files created by the container, is also possible while the container is not running.
Note: I'm not sure whether this is the correct way of doing things. I will, however, edit this post as soon as I encounter any problems.
After pulling the image, you can use code like this in the shell:
docker run --rm -it -p 8888:8888 -v d:/Kaggles:/d kaggle/python
Run jupyter notebook inside the container
jupyter notebook --ip=0.0.0.0 --no-browser
This mounts the local directory onto the container having access to it.
Then, go to the browser and hit https://localhost:8888, and when I open a new kernel it's with Python 3.5/ I don't recall doing anything special when pulling the image or setting up Docker.
You can find more information from here.
You can also try using datmo in order to easily setup environment and track machine learning projects to make experiments reproducible. You can run datmo task command as follows for setting up jupyter notebook,
datmo task run 'jupyter notebook' --port 8888
It sets up your project and files inside the environment to keep track of your progress.

Categories