Programatically create SSH tunnel inside of dockerized apache airflow python operator

Programatically create SSH tunnel inside of dockerized apache airflow python operator - python

My program is unable to create an SSH tunnel while inside of my docker container running apache airflow. Only running the function on my local machine works fine. I have a list of servers which I use to create a tunnel, query the database, and close the connection. Normally, I'd do it the following way:
for server in servers:
server_conn = sshtunnel.SSHTunnelForwarder(
server,
ssh_username=ssh_user,
ssh_password=ssh_password,
remote_bind_address=(localhost, db_port),
local_bind_address=(localhost, localport)
)
This works as expected and I can do whatever I need from there. However, within Docker, it does not work. I realize that docker runs and binds to a port and is not actually apart of the host system, so I used network_mode="host" to help mitigate this issue. However, this does not work because my containers lose the ability to communicate with one another. Here is my docker-compose file
postgres:
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
- PGDATA=/var/lib/postgresql/data/pgdata
volumes:
- ~/.whale/pgdata:/var/lib/postgresql/data/pgdata
- ./dags/docker/sql/create.sql:/docker-entrypoint-initdb.d/init.sql
ports:
- "5432:5432"
webserver:
image: hawk
build:
context: .
dockerfile: ./dags/docker/Dockerfile-airflow
restart: always
depends_on:
- postgres
# - redis
environment:
- LOAD_EX=n
- FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho=
- EXECUTOR=Local
volumes:
- ./dags:/usr/local/airflow/dags
# Uncomment to include custom plugins
# - ./plugins:/usr/local/airflow/plugins
ports:
- "8080:8080"
- "52023:22"
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
I also followed the instructions here and got to the point where I can docker exec into my container and manually type the above python snippet and get a working connection.
Additionally, I have read the airflow documentation here which covers SSH connection operators, but those only support bash commands, I will need my python function to run. I am truly confused why the python code would work while exec-ed into the system, but not when I run it via my airflow DAG. At this time, I am unable to manually put all of the connections in because there will be > 100 once this system deploys. Any help would be greatly appreciated. If more depth is needed, please let me know.

I was having this same issue when opening the tunnel and trying to connect to the database in separate tasks, but got it working by doing both in the same task (Airflow doesn't persist state between task runs):
def select_from_tunnel_db():
# Open SSH tunnel
ssh_hook = SSHHook(ssh_conn_id='bastion-ssh-conn', keepalive_interval=60)
tunnel = ssh_hook.get_tunnel(5432, remote_host='<db_host>', local_port=5432)
tunnel.start()
# Connect to DB and run query
pg_hook = PostgresHook(
postgres_conn_id='remote-db-conn', # NOTE: host='localhost'
schema='db_schema'
)
pg_cursor = pg_hook.get_conn().cursor()
pg_cursor.execute('SELECT * FROM table;')
select_val = pg_cursor.fetchall()
return select_val
python_operator = PythonOperator(
task_id='test_tunnel_conn',
python_callable=select_from_tunnel_db,
dag=dag
)
This forwards traffic on port 5432 from the local machine to the same port on the remote db host. The SSHHook requires a working ssh connection to the endpoint you will be tunneling through and PostgresHook requires a postgres connection to 'localhost' on port 5432.

Related

psycopg2.OperationalError: could not translate host name to address using psycopg2 and docker

I've been trying for a few hours now but no solution from similar asked questions seem to work for me...
I am using docker-compose to setup a postgresql database and run a python webserver from where I want to connect to my postgressql database (so it's running inside the container)
version: '3.8'
services:
database:
container_name: database
hostname: database
image: postgres
restart: always
environment:
POSTGRES_DB: mydatabase
POSTGRES_USER: postgres
POSTGRES_PASSWORD: password
volumes:
- postgres:/pgdata
- ./application/ressources/fixtures.sql:/docker-entrypoint-initdb.d/fixtures.sql
ports:
- "5432:5432"
application:
container_name: application
build: .
ports:
- "5001:5001"
volumes:
- ./application:/application
restart: always
depends_on:
- database
volumes:
postgres:
I trying to connect as follows ( I have read that despite the depends on in my dockerfile the database needs some more time until it can accept connections so i added a retry logic):
retries = 0
while retries < 5:
retries = retries + 1
self.conn = psycopg2.connect(user='postgres', password='password',
host='database', port="5432", database='mydatabase')
if not self.conn:
logging.info("retry to connect")
sleep(5)
The weird thing is that when running it with docker-compose -f docker-compose.yml up everything works fine.
But when I built the image (docker build -t myapp:0.1) and run it (docker run myapp:0.1) it gives me the following error:
File "/application/libraries/database.py", line 18, in establishConnection
self.conn = psycopg2.connect(user=CONFIG.DATABASE_USER, password=CONFIG.DATABASE_PASSWORD,
File "/usr/local/lib/python3.9/site-packages/psycopg2/__init__.py", line 127, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: could not translate host name "database" to address: Name or service not known
I've read that when using docker-compose a single network is created, so this can't be the error here i guess Docker Compose doku
Thanks in advance,
Jacky

If you run docker run on an image, it does only what's on that command line and no more. In particular, the plain docker commands don't know about docker-compose.yml and any of the settings you might specify there.
The short answer is to always use docker-compose up to launch the containers.
In principle you could translate the docker-compose.yml file into explicit docker commands. At a minimum you'd need to manually create a Docker network and specify the container names:
docker network create app-net
docker run -d --net app-net --name database -p 5432:5432 postgres
docker run -d --net app-net -e PGHOST=database -p 5001:5001 myapp:0.1
This hasn't included the other options in the Compose setup, though, notably database persistence. There are enough settings that you'd want to write them down in a file, and at that point you've basically reconstructed the Compose environment.

docker-compose and connection to Mongo container

I am trying to create 2 containers as per the following docker-compose.yml file. The issue is that if I start up the mongo database container and then run my code locally (hitting 127.0.0.1) then everything is fine but if I try and run my api container and hit that (see yml file) then I get connection refused i.e.
172.29.0.12:27117: [Errno 111] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id:
60437a460a3e0fa904650e35, topology_type: Single, servers: [<ServerDescription ('172.29.0.12', 27117) server_type:
Unknown, rtt: None, error=AutoReconnect('172.29.0.12:27117: [Errno 111] Connection refused')>]>
Please note: I have set mongo to use port 27117 rather than 27017
My app is a Python Flask app and I am using PyMongo in the following manner:
try:
myclient = pymongo.MongoClient('mongodb://%s:%s#%s:%s/%s' % (username, password, hostName, port, database))
mydb = myclient[database]
cursor = mydb["temperatures"]
app.logger.info('Database connected to: ' + database)
except:
app.logger.error('Error connecting to database')
What's driving me mad is it runs locally and successfully accesses mongo via the container, but as soon as I try the app in a container it fails.
docker-compose.yml as follows:
version: '3.7'
services:
hotbin-db:
image: mongo
container_name: hotbin-db
restart: always
ports:
# <Port exposed> : < MySQL Port running inside container>
- '27117:27017'
expose:
# Opens port 3306 on the container
- '27117'
command: [--auth]
environment:
MONGO_INITDB_ROOT_USERNAME: ***
MONGO_INITDB_ROOT_PASSWORD: ***
MONGO_INITDB_DATABASE: ***
MONGODB_DATA_DIR: /data/db
MONDODB_LOG_DIR: /dev/null
# Where our data will be persisted
volumes:
- /home/simon/mongodb/database/hotbin-db/:/data/db
#- my-db:/var/lib/mysql
# env_file:
# - .env
networks:
hotbin-net:
ipv4_address: 172.29.0.12
hotbin-api:
image: scsherlock/compost-api:latest
container_name: hotbin-api
environment:
MONGODB_DATABASE: ***
MONGODB_USERNAME: ***
MONGODB_PASSWORD: ***
MONGODB_HOSTNAME: 172.29.0.12
MONGODB_PORT: '27117'
depends_on:
- hotbin-db
restart: always
ports:
# <Port exposed> : < MySQL Port running inside container>
- '5050:5050'
expose:
- '5050'
networks:
hotbin-net:
ipv4_address: 172.29.0.13
# # Names our volume
volumes:
my-db:
networks:
hotbin-net:
driver: bridge
ipam:
driver: default
config:
- subnet: 172.29.0.0/16

Using the service name of the mongo container and the standard port of
27017 instead of 27117 (even though that's what is defined in the
docker-compose file) works. I'd like to understand why though
Your docker compose file does NOT configure MongoDB to run on port 27117. If you want to get it to run on 27117 you would have to change this line in the docker compose:
command: mongod --auth --port 27117
As you haven't specified a port, MongoDB will run on the default port 27017.
Your expose section exposes the container port 27117 to the host, but Mongo isn't running on that port, so that line is effectively doing nothing.
Your ports section maps a host port 27117 to a container port 27017. This means if you're connecting from the host, you can connect on port 27117, but that is connecting to port 27017 on the container.
Now to your python program. As this is running in the container network, to connect services within a docker-compose network, you reference them by their service name.
Putting this together, your connection string will be: mongodb://hotbin-db:27017/yourdb?<options>
As others have mentioned, you really don't need to create specific IP addresses unless you have a very good need to. You also don't even to define a network, as docker-compose creates it's own internal network.
Reference: https://docs.docker.com/compose/networking/

Are you using Windows to run the container?
If yes,localhost is identified as localhost of the container and not the localhost of your host machine.
Hence, instead of providing the IP address of your host, try modifying your mongodB string this way when running inside the docker container:
Try this:
mongodb://host.docker.internal:27017/
instead of:
mongodb://localhost:27017/

how to connect python app in docker container with running docker container with url

I have an app in python that I want to run in a docker container and it has a line:
h2o.connect(ip='127.0.0.1', port='54321')
The h2o server is running in docker container and it always has different ip. One time it was started on 172.19.0.5, the other time 172.19.0.3, sometimes 172.17.0.3.
So it is always random, and I can't connect the python app.
I tried to expose the port of h2o server to localhost and then connect the python (the code above), but it is not working.

You dont connect two docker containers though ip addresses. Instead, you want to use docker internal network aliases:
version: '3'
services:
server:
...
depends_on:
- database
database:
...
expose:
- 54321:54321
then you can define your connectio in server as:
h2o.connect(ip='127.0.0.1', port='54321')

Celery workers unable to connect to redis on docker instances

I have a dockerized setup running a Django app within which I use Celery tasks. Celery uses Redis as the broker.
Versioning:
Docker version 17.09.0-ce, build afdb6d4
docker-compose version 1.15.0, build e12f3b9
Django==1.9.6
django-celery-beat==1.0.1
celery==4.1.0
celery[redis]
redis==2.10.5
Problem:
My celery workers appear to be unable to connect to the redis container located at localhost:6379. I am able to telnet into the redis server on the specified port. I am able to verify redis-server is running on the container.
When I manually connect to the Celery docker instance and attempt to create a worker using the command celery -A backend worker -l info I get the notice:
[2017-11-13 18:07:50,937: ERROR/MainProcess] consumer: Cannot connect to redis://localhost:6379/0: Error 99 connecting to localhost:6379. Cannot assign requested address..
Trying again in 4.00 seconds...
Notes:
I am able to telnet in to the redis container on port 6379. On the redis container, redis-server is running.
Is there anything else that I'm missing? I've gone pretty far down the rabbit hole, but feel like I'm missing something really simple.
DOCKER CONFIG FILES:
docker-compose.common.yml here
docker-compose.dev.yml here

When you use docker-compose, you aren't going to be using localhost for inter-container communication, you would be using the compose-assigned hostname of the container. In this case, the hostname of your redis container is redis. The top level elements under services: are your default host names.
So for celery to connect to redis, you should try redis://redis:6379/0. Since the protocol and the service name are the same, I'll elaborate a little more: if you named your redis service "butter-pecan-redis" in your docker-compose, you would instead use redis://butter-pecan-redis:6379/0.
Also, docker-compose.dev.yml doesn't appear to have celery and redis on a common network, which might cause them not to be able to see each other. I believe they need to share at least one network in common to be able to resolve their respective host names.
Networking in docker-compose has an example in the first handful of paragraphs, with a docker-compose.yml to look at.

You may need to add the link and depends_on sections to your docker compose file, and then reference the containers by their hostname.
Updated docker-compose.yml:
version: '2.1'
services:
db:
image: postgres
memcached:
image: memcached
redis:
image: redis
ports:
- '6379:6379'
backend-base:
build:
context: .
dockerfile: backend/Dockerfile-base
image: "/backend:base"
backend:
build:
context: .
dockerfile: backend/Dockerfile
image: "/backend:${ENV:-local}"
command: ./wait-for-it.sh db:5432 -- gunicorn backend.wsgi:application -b 0.0.0.0:8000 -k gevent -w 3
ports:
- 8000
links:
- db
- redis
- memcached
depends_on:
- db
- redis
- memcached
celery:
image: "/backend:${ENV:-local}"
command: ./wait-for-it.sh db:5432 -- celery worker -E -B --loglevel=INFO --concurrency=1
environment:
C_FORCE_ROOT: "yes"
links:
- db
- redis
- memcached
depends_on:
- db
- redis
- memcached
frontend-base:
build:
context: .
dockerfile: frontend/Dockerfile-base
args:
NPM_REGISTRY: http://.view.build
PACKAGE_INSTALLER: yarn
image: "/frontend:base"
links:
- db
- redis
- memcached
depends_on:
- db
- redis
- memcached
frontend:
build:
context: .
dockerfile: frontend/Dockerfile
image: "/frontend:${ENV:-local}"
command: 'bash -c ''gulp'''
working_dir: /app/user
environment:
PORT: 3000
links:
- db
- redis
- memcached
depends_on:
- db
- redis
- memcached
Then configure the urls to redis, postgres, memcached, etc. with:
redis://redis:6379/0
postgres://user:pass#db:5432/database

The issue for me was that all of the containers, including celery had a network argument specified. If this is the case the redis container must also have the same argument otherwise you will get this error. See below, the fix was adding 'networks':
redis:
image: redis:alpine
ports:
- '6379:6379'
networks:
- server

Docker cannot connect application to MySQL

I am trying to run integration tests (in python) which depend on mysql. Currently they depend on SQL running locally, but I want them to depend on a MySQL running in docker.
Contents of Dockerfile:
FROM continuumio/anaconda3:4.3.1
WORKDIR /opt/workdir
ADD . /opt/workdir
RUN python setup.py install
Contents of Docker Compose:
version: '2'
services:
mysql:
image: mysql:5.6
container_name: test_mysql_container
environment:
- MYSQL_ROOT_PASSWORD=test
- MYSQL_DATABASE=My_Database
- MYSQL_USER=my_user
- MYSQL_PASSWORD=my_password
volumes:
- db_data:/var/lib/mysql
restart: always
expose:
- "3306"
my_common_package:
image: my_common_package
depends_on:
- mysql
restart: always
links:
- mysql
volumes:
db_data:
Now, I try to run the tests in my package using:
docker-compose run my_common_package python testsql.py
and I receive the error
pymysql.err.OperationalError: (2003, "Can't connect to MySQL server on
'localhost' ([Errno 99] Cannot assign requested address)")

docker-compose will by default create virtual network were all the containers/services in the compose file can reach each other by an IP address. By using links, depends_on or network aliases they can reach each other by host name. In your case the host name is the service name, but this can be overridden. (see: docs)
Your script in my_common_package container/service should then connect to mysql on port 3306 according to your setup. (not localhost on port 3306)
Also note that using expose is only necessary if the Dockerfile for the service don't have an EXPOSE statement. The standard mysql image already does this.
If you want to map a container port to localhost you need to use ports, but only do this if it's necessary.
services:
mysql:
image: mysql:5.6
container_name: test_mysql_container
environment:
- MYSQL_ROOT_PASSWORD=test
- MYSQL_DATABASE=My_Database
- MYSQL_USER=my_user
- MYSQL_PASSWORD=my_password
volumes:
- db_data:/var/lib/mysql
ports:
- "3306:3306"
Here we are saying that port 3306 in the mysql container should be mapped to localhost on port 3306.
Now you can connect to mysql using localhost:3306 outside of docker. For example you can try to run your testsql.py locally (NOT in a container).
Container to container communication will always happen using the host name of each container. Think of containers as virtual machines.
You can even find the network docker-compose created using docker network list:
1b1a54630639 myproject_default bridge local
82498fd930bb bridge bridge local
.. then use docker network inspect <id> to look at the details.
Assigned IP addresses to containers can be pretty random, so the only viable way for container to container communication is using hostnames.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.