Airflow getting error No module named 'httplib2' - python

I am getting an error when loading airflow over docker. I have the package already installed on my anaconda env. I am new to airflow and following this course, under the video, there is also a link to GitHub, where is the code that I am using to recreate the task:
https://www.youtube.com/watch?v=wAyu5BN3VpY&t=717s [29:30min]
version: '3'
services:
postgres:
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
ports:
- "5432:5432"
webserver:
image: puckel/docker-airflow:1.10.1
build:
context: https://github.com/puckel/docker-airflow.git#1.10.1
dockerfile: Dockerfile
args:
AIRFLOW_DEPS: gcp_api,s3
PYTHON_DEPS: sqlalchemy==1.2.0
restart: always
depends_on:
- postgres
environment:
- LOAD_EX=n
- EXECUTOR=Local
- FERNET_KEY=jsDPRErfv8Z_eVTnGfF8ywd19j4pyqE3NpdUBA_oRTo=
volumes:
- ./examples/gcloud-example/dags:/usr/local/airflow/dags
# Uncomment to include custom plugins
# - ./plugins:/usr/local/airflow/plugins
ports:
- "8080:8080"
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
#!/bin/bash
docker-compose -f docker-compose-gcloud.yml up --abort-on-container-exit

Related

APM Server has still not connected to Elasticsearch in docker-compose

I have the following docker-compose in order to store logs from fastapi.
apm-server:
image: docker.elastic.co/apm/apm-server:7.13.0
cap_add: ["CHOWN", "DAC_OVERRIDE", "SETGID", "SETUID"]
cap_drop: ["ALL"]
networks:
- es-net
ports:
- 8200:8200
command: >
apm-server -e
-E apm-server.rum.enabled=true
-E setup.kibana.host=kibana:5601
-E setup.template.settings.index.number_of_replicas=0
-E apm-server.kibana.enabled=true
-E apm-server.kibana.host=kibana:5601
-E output.elasticsearch.hosts=["elasticsearch:9200"]
healthcheck:
interval: 10s
retries: 12
test: curl --write-out 'HTTP %{http_code}' --fail --silent --output /dev/null http://localhost:8200/
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.4.0
container_name: elasticsearch
environment:
- xpack.security.enabled=false
- discovery.type=single-node
networks:
- es-net
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
cap_add:
- IPC_LOCK
volumes:
- elasticsearch-data:/usr/share/elasticsearch/data
ports:
- 9200:9200
- 9300:9300
kibana:
container_name: kibana
image: docker.elastic.co/kibana/kibana:7.4.0
networks:
- es-net
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
ports:
- 5601:5601
depends_on:
- elasticsearch
volumes:
elasticsearch-data:
driver: local
networks:
es-net:
driver: bridge
The problem is that kibana doesn't recognise the APM service
The fastapi is configured with the following:
from elasticapm.contrib.starlette import make_apm_client, ElasticAPM
apm = make_apm_client({
'SERVICE_NAME': 'service'
})
app = FastAPI()
app.add_middleware(ElasticAPM, client=apm)
Any idea about how can I solve this?
Thanks
The problem was solved changing the apm-server to 7.4.0

Run a command in Docker after certain container is up

Have a Docker Compose with 4 containers.
How do I run a python script (python manage.py setup) in container in server_1 when postgres_1 is up, but only once (state should be persisted somewhere, maybe via volume?)
I persist PostgreSQL data to disk via volume.
Is this any nice way?
Want to make up setup and running of software very easy, just using docker-compose up. Should not matter, if this is first run or further runs. First run needs python manage.py setup invocation.
Is there a nice way of doing it?
Idea was to check for existence of file flag in mounted volume, but don't know how to wait in server_1 for postgres_1 to be up.
Here is my docker-compose.yml
version: '3'
services:
server:
build:
context: .
dockerfile: docker/backend/Dockerfile
restart: always
working_dir: /srv/scanmycode/
entrypoint: python
command: /srv/scanmycode/manage.py runserver
ports:
- 5000:5000
volumes:
- ./data1:/srv/scanmycode/quantifiedcode/data/
- ./data2:/srv/scanmycode/quantifiedcode/backend/data/
links:
- "postgres"
postgres:
image: postgres:13.2
restart: unless-stopped
environment:
POSTGRES_DB: qc
POSTGRES_USER: qc
POSTGRES_PASSWORD: qc
PGDATA: /var/lib/postgresql/data/pgdata
ports:
- "5432:5432"
volumes:
-
type: bind
source: ./postgres-data
target: /var/lib/postgresql/data
worker_1:
build:
context: .
dockerfile: docker/worker/Dockerfile
args:
- GIT_TOKEN
hostname: worker_1
restart: on-failure
depends_on:
- rabbitmq3
working_dir: /srv/scanmycode/
entrypoint: python
command: /srv/scanmycode/manage.py runworker
volumes:
- ./data1:/srv/scanmycode/quantifiedcode/data/
- ./data2:/srv/scanmycode/quantifiedcode/backend/data/
links:
- "rabbitmq3"
- "server"
- "postgres"
rabbitmq3:
container_name: "rabbitmq"
image: rabbitmq:3.8-management-alpine
environment:
- RABBITMQ_DEFAULT_USER=qc
- RABBITMQ_DEFAULT_PASS=qc
ports:
- 5672:5672
- 15672:15672
healthcheck:
test: [ "CMD", "nc", "-z", "localhost", "5672" ]
interval: 5s
timeout: 15s
retries: 1
Used this:
version: '3'
services:
server:
build:
context: .
dockerfile: docker/backend/Dockerfile
restart: always
depends_on:
- postgres
working_dir: /srv/scanmycode/
entrypoint: sh
command: -c "if [ -f /srv/scanmycode/setup_state/setup_done ]; then python /srv/scanmycode/manage.py runserver; else python /srv/scanmycode/manage.py setup && mkdir -p /srv/scanmycode/setup_state && touch /srv/scanmycode/setup_state/setup_done; fi"
ports:
- 5000:5000
volumes:
- ./data1:/srv/scanmycode/quantifiedcode/data/
- ./data2:/srv/scanmycode/quantifiedcode/backend/data/
- ./setup_state:/srv/scanmycode/setup_state
links:
- "postgres"
postgres:
image: postgres:13.2
restart: unless-stopped
environment:
POSTGRES_DB: qc
POSTGRES_USER: qc
POSTGRES_PASSWORD: qc
PGDATA: /var/lib/postgresql/data/pgdata
ports:
- "5432:5432"
volumes:
- db-data:/var/lib/postgresql/data
worker_1:
build:
context: .
dockerfile: docker/worker/Dockerfile
hostname: worker_1
restart: on-failure
depends_on:
- rabbitmq3
- postgres
- server
working_dir: /srv/scanmycode/
entrypoint: python
command: /srv/scanmycode/manage.py runworker
volumes:
- ./data1:/srv/scanmycode/quantifiedcode/data/
- ./data2:/srv/scanmycode/quantifiedcode/backend/data/
links:
- "rabbitmq3"
- "server"
- "postgres"
rabbitmq3:
container_name: "rabbitmq"
image: rabbitmq:3.8-management-alpine
environment:
- RABBITMQ_DEFAULT_USER=qc
- RABBITMQ_DEFAULT_PASS=qc
ports:
- 5672:5672
- 15672:15672
healthcheck:
test: [ "CMD", "nc", "-z", "localhost", "5672" ]
interval: 5s
timeout: 15s
retries: 1
volumes:
db-data:
driver: local

Running Spark on Docker using bitnami image

I'm trying to run spark in a docker container from a python app which is located in another container:
version: '3'
services:
spark-master:
image: docker.io/bitnami/spark:2
environment:
- SPARK_MODE=master
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
volumes:
- type: bind
source: ./conf/log4j.properties
target: /opt/bitnami/spark/conf/log4j.properties
ports:
- '8080:8080'
- '7077:7077'
networks:
- spark
container_name: spark
spark-worker-1:
image: docker.io/bitnami/spark:2
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://spark:7077
- SPARK_WORKER_MEMORY=1G
- SPARK_WORKER_CORES=1
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
volumes:
- type: bind
source: ./conf/log4j.properties
target: /opt/bitnami/spark/conf/log4j.properties
ports:
- '8081:8081'
container_name: spark-worker
networks:
- spark
depends_on:
- spark-master
zookeeper:
image: confluentinc/cp-zookeeper:latest
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
ports:
- 22181:2181
container_name: zookeeper
networks:
- rmoff_kafka
kafka:
image: confluentinc/cp-kafka:5.5.0
depends_on:
- zookeeper
ports:
- 9092:9092
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
container_name: kafka
networks:
- rmoff_kafka
app:
build:
context: ./
depends_on:
- kafka
ports:
- 5000:5000
container_name: app
networks:
- rmoff_kafka
networks:
spark:
driver: bridge
rmoff_kafka:
name: rmoff_kafka
When I try to create a SparkSession:
conf = SparkConf()
conf.setAll(
[
(
"spark.master",
os.environ.get("SPARK_MASTER_URL", "spark://spark:7077"),
),
("spark.driver.host", os.environ.get("SPARK_DRIVER_HOST", "local[*]")),
("spark.submit.deployMode", "client"),
('spark.ui.showConsoleProgress', 'true'),
("spark.driver.bindAddress", "0.0.0.0"),
("spark.app.name", app_name)
]
)
spark_session = SparkSession.builder.config(conf=conf).getOrCreate()
I get an error related with Java:
JAVA_HOME is not set
Exception: Java gateway process exited before sending its port number
I supose I have to install Java or set Java Home environment variable, but I don't know how to exactly tackle the problem. Should I install java in the spark container or the container from I run the python script?
Add in your app Dockerfile install of java
# Install OpenJDK-11
RUN apt-get update && \
apt-get install -y openjdk-11-jre-headless && \
apt-get clean;

Docker compose file for airflow 2 ( version 2.0.0 )

I am looking to write docker compose file to locally execute airflow in production similar environent.
For older airflow v1.10.14, docker compose is working fine. But same docker compose is not working for latest stable version, airflow scheduler & webservice is failing continuously. error message looks like unable to create audit tables.
docker-compose.yaml:
version: "2.1"
services:
postgres:
image: postgres:12
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
ports:
- "5433:5432"
scheduler:
image: apache/airflow:1.10.14
restart: always
depends_on:
- postgres
- webserver
env_file:
- .env
ports:
- "8793:8793"
volumes:
- ./dags:/opt/airflow/dags
- ./airflow-logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
command: scheduler
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 5
webserver:
image: apache/airflow:1.10.14
hostname: webserver
restart: always
depends_on:
- postgres
env_file:
- .env
volumes:
- ./dags:/opt/airflow/dags
- ./scripts:/opt/airflow/scripts
- ./airflow-logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
ports:
- "8080:8080"
entrypoint: ./scripts/airflow-entrypoint.sh
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 5
.env:
AIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS=False
AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgres+psycopg2://airflow:airflow#postgres:5432/airflow
AIRFLOW__CORE__FERNET_KEY=81HqDtbqAywKSOumSha3BhWNOdQ26slT6K0YaZeZyPs=
AIRFLOW_CONN_METADATA_DB=postgres+psycopg2://airflow:airflow#postgres:5432/airflow
AIRFLOW_VAR__METADATA_DB_SCHEMA=airflow
AIRFLOW__SCHEDULER__SCHEDULER_HEARTBEAT_SEC=10
./scripts/airflow-entrypoint.sh:
#!/usr/bin/env bash
airflow upgradedb
airflow webserver
There is an official docker-compose.yml see here.
You will also find more information about docker and Kubernetes deployment in the official docs
Official docker image for Airflow version 2.0 is available now. Here is list of 2.0.0 docker images
Example docker image :
# example:
apache/airflow:2.0.0-python3.8
The below docker-compose handles
airflow db schema upgrade
admin user creation
Here is a version that I use:
version: "3.7"
# Common sections extracted out
x-airflow-common:
&airflow-common
image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.0.0-python3.8}
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:#redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
# Change log level when needed
AIRFLOW__LOGGING__LOGGING_LEVEL: 'INFO'
volumes:
- ./airflow/dags:/opt/airflow/dags
depends_on:
- postgres
- redis
networks:
- default_net
volumes:
postgres-db-volume:
services:
postgres:
image: postgres:13
environment:
POSTGRES_USER: airflow
POSTGRES_PASSWORD: airflow
POSTGRES_DB: airflow
volumes:
- postgres-db-volume:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 5s
retries: 5
restart: always
networks:
- default_net
redis:
image: redis:6.0.10
ports:
- 6379:6379
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 30s
retries: 50
restart: always
networks:
- default_net
airflow-webserver:
<<: *airflow-common
command: webserver
ports:
- 8080:8080
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
interval: 10s
timeout: 10s
retries: 5
restart: always
environment:
<<: *airflow-common-env
# It is sufficient to run db upgrade and create admin user only from webserver service
_AIRFLOW_WWW_USER_PASSWORD: 'yourAdminPass'
_AIRFLOW_DB_UPGRADE: 'true'
_AIRFLOW_WWW_USER_CREATE: 'true'
airflow-scheduler:
<<: *airflow-common
command: scheduler
restart: always
airflow-worker:
<<: *airflow-common
command: celery worker
restart: always
networks:
default_net:
attachable: true
update docker-compose version to :version: "3.7"

How to run Python Django and Celery using docker-compose?

I've a Python application using Django and Celery, and I trying to run using docker and docker-compose because i also using Redis and Dynamodb
The problem is the following:
I'm not able to execute both services WSGI and Celery, cause just the first instruction works fine..
version: '3.3'
services:
redis:
image: redis:3.2-alpine
volumes:
- redis_data:/data
ports:
- "6379:6379"
dynamodb:
image: dwmkerr/dynamodb
ports:
- "3000:8000"
volumes:
- dynamodb_data:/data
jobs:
build:
context: nubo-async-cfe-seces
dockerfile: Dockerfile
environment:
- REDIS_HOST=redisrvi
- PYTHONUNBUFFERED=0
- CC_DYNAMODB_NAMESPACE=None
- CC_DYNAMODB_ACCESS_KEY_ID=anything
- CC_DYNAMODB_SECRET_ACCESS_KEY=anything
- CC_DYNAMODB_HOST=dynamodb
- CC_DYNAMODB_PORT=8000
- CC_DYNAMODB_IS_SECURE=False
command: >
bash -c "celery worker -A tasks.async_service -Q dynamo-queue -E --loglevel=ERROR &&
uwsgi --socket 0.0.0.0:8080 --protocol=http --wsgi-file nubo_async/wsgi.py"
depends_on:
- redis
- dynamodb
volumes:
- .:/jobs
ports:
- "9090:8080"
volumes:
redis_data:
dynamodb_data:
Has anyone had the same problem?
You may refer to docker-compose of Saleor project. I would suggest to let celery run its daemon only depend on redis as the broker. See the configuration of docker-compose.yml file:
services:
web:
build:
context: .
dockerfile: ./Dockerfile
args:
STATIC_URL: '/static/'
restart: unless-stopped
networks:
- saleor-backend-tier
env_file: common.env
depends_on:
- db
- redis
celery:
build:
context: .
dockerfile: ./Dockerfile
args:
STATIC_URL: '/static/'
command: celery -A saleor worker --app=saleor.celeryconf:app --loglevel=info
restart: unless-stopped
networks:
- saleor-backend-tier
env_file: common.env
depends_on:
- redis
See also that the connection from both services to redis are set separately by the environtment vatables as shown on the common.env file:
CACHE_URL=redis://redis:6379/0
CELERY_BROKER_URL=redis://redis:6379/1
Here's the docker-compose as suggested by #Satevg, run the Django and Celery application by separate containers. Works fine!
version: '3.3'
services:
redis:
image: redis:3.2-alpine
volumes:
- redis_data:/data
ports:
- "6379:6379"
dynamodb:
image: dwmkerr/dynamodb
ports:
- "3000:8000"
volumes:
- dynamodb_data:/data
jobs:
build:
context: nubo-async-cfe-services
dockerfile: Dockerfile
environment:
- REDIS_HOST=redis
- PYTHONUNBUFFERED=0
- CC_DYNAMODB_NAMESPACE=None
- CC_DYNAMODB_ACCESS_KEY_ID=anything
- CC_DYNAMODB_SECRET_ACCESS_KEY=anything
- CC_DYNAMODB_HOST=dynamodb
- CC_DYNAMODB_PORT=8000
- CC_DYNAMODB_IS_SECURE=False
command: bash -c "uwsgi --socket 0.0.0.0:8080 --protocol=http --wsgi-file nubo_async/wsgi.py"
depends_on:
- redis
- dynamodb
volumes:
- .:/jobs
ports:
- "9090:8080"
celery:
build:
context: nubo-async-cfe-services
dockerfile: Dockerfile
environment:
- REDIS_HOST=redis
- PYTHONUNBUFFERED=0
- CC_DYNAMODB_NAMESPACE=None
- CC_DYNAMODB_ACCESS_KEY_ID=anything
- CC_DYNAMODB_SECRET_ACCESS_KEY=anything
- CC_DYNAMODB_HOST=dynamodb
- CC_DYNAMODB_PORT=8000
- CC_DYNAMODB_IS_SECURE=False
command: celery worker -A tasks.async_service -Q dynamo-queue -E --loglevel=ERROR
depends_on:
- redis
- dynamodb
volumes:
- .:/jobs
volumes:
redis_data:
dynamodb_data:

Categories