Python, Kafka and Docker - KafkaConsumer keeps hanging

Python, Kafka and Docker - KafkaConsumer keeps hanging - python

I have the following docker-compose file:
version: '3.1'
services:
postgres_db:
image: postgres
restart: always
environment:
POSTGRES_USER: admin
POSTGRES_PASSWORD: admin
POSTGRES_DB: default_db
ports:
- 54320:5432
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
kafka:
image: wurstmeister/kafka
ports:
- "9092:9092"
environment:
KAFKA_ADVERTISED_HOST_NAME: kafka
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_CREATE_TOPICS: "test:1:1"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
After running it using docker-compose up - everything looks fine from the terminal output.
I start a python console and run the following lines:
kp = KafkaProducer(bootstrap_servers=['localhost:9092'],api_version=(0,10),
value_serializer=lambda x:
dumps(x).encode('utf-8'))
kc = KafkaConsumer('test', bootstrap_servers=['localhost:9092'],api_version=(0,10),group_id=None,auto_offset_reset='earliest',
value_deserializer=lambda json_data: json.loads(json_data.decode('utf-8')))
data = {"test":"test"}
kp.send(topic="test",value=data)
for message in kc:
print(message.value)
However after running this, the console simply hangs and it dosent look like the message was consumed/produced. Any ideas what went wrong here? Thanks!

Either you need to run your Python code in a container and set
bootstrap_servers=['kafka:9092']
Or you need to advertise Kafka back to the clients on your host machine
KAFKA_ADVERTISED_HOST_NAME: localhost
You can read the wurstmeister README on the usage of HOSTNAME_COMMAND as well
I'd also recommend running the producer and consumer separately as you test them

Related

Trouble Remotely Connecting Flask App to Selenium Grid using Docker

I am new to both Docker and Selenium grid and am having issues getting my web app to connect to the selenium hub.
docker-compose.yml
version: '3.8'
services:
db:
image: postgres
environment:
POSTGRES_DB: ${POSTGRES_DB}
POSTGRES_USER: ${POSTGRES_USER}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
ports:
- "${POSTGRES_PORT}:5432"
volumes:
- pgdata:/var/lib/postgresql/data
web:
build:
context: ..
dockerfile: docker/Dockerfile
environment:
FLASK_ENV: ${FLASK_ENV}
FLASK_CONFIG: ${FLASK_CONFIG}
APPLICATION_DB: ${APPLICATION_DB}
POSTGRES_USER: ${POSTGRES_USER}
POSTGRES_HOSTNAME: "db"
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
POSTGRES_PORT: ${POSTGRES_PORT}
command: flask run --host 0.0.0.0
volumes:
- ..:/opt/code
ports:
- "5000:5000"
chrome:
image: selenium/node-chrome:4.0.0-20211013
shm_size: 2gb
depends_on:
- selenium-hub
environment:
- SE_EVENT_BUS_HOST=selenium-hub
- SE_EVENT_BUS_PUBLISH_PORT=4442
- SE_EVENT_BUS_SUBSCRIBE_PORT=4443
- SE_NODE_GRID_URL=http://localhost:4444
ports:
- "6900:5900"
edge:
image: selenium/node-edge:4.0.0-20211013
shm_size: 2gb
depends_on:
- selenium-hub
environment:
- SE_EVENT_BUS_HOST=selenium-hub
- SE_EVENT_BUS_PUBLISH_PORT=4442
- SE_EVENT_BUS_SUBSCRIBE_PORT=4443
- SE_NODE_GRID_URL=http://localhost:4444
ports:
- "6901:5900"
firefox:
image: selenium/node-firefox:4.0.0-20211013
shm_size: 2gb
depends_on:
- selenium-hub
environment:
- SE_EVENT_BUS_HOST=selenium-hub
- SE_EVENT_BUS_PUBLISH_PORT=4442
- SE_EVENT_BUS_SUBSCRIBE_PORT=4443
- SE_NODE_GRID_URL=http://localhost:4444
ports:
- "6902:5900"
selenium-hub:
image: selenium/hub:4.0.0-20211013
container_name: selenium-hub
ports:
- "4442:4442"
- "4443:4443"
- "4444:4444"
volumes:
pgdata:
When running the stack of containers and checking netstat -a, I can see my desktop listening to port 4444 and when I kill the containers its not.
I can also verify that the hub is running and all of my nodes are connecting fine by visiting https//:localhost/4444, however when I run driver = webdriver.Remote(command_executor="http://localhost:4444") from my python flask app (which is running in the container specified as web above) I get the error:
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=4444): Max retries
exceeded with url: /session (Caused by NewConnectionError('<urllib3.connection.HTTPConnection
object at 0x7fd4855b7730>: Failed to establish a new connection: [Errno 111] Connection refused'))
I have tried specifying the desired capabilities to match the driver for specific nodes, however I receive this same error regardless.
I am using Selenium's latest build "4.0.0" and as you can see 4.0.0 images for the parts of the grid so I don't think its a compatibility issue.
docker ps
Name Command State Ports
------------------------------------------------------------------------------------------------------------------------
development_chrome_1 /opt/bin/entry_point.sh Up 0.0.0.0:6900->5900/tcp,:::6900->5900/tcp
development_db_1 docker-entrypoint.sh postgres Up 0.0.0.0:5432->5432/tcp,:::5432->5432/tcp
development_edge_1 /opt/bin/entry_point.sh Up 0.0.0.0:6901->5900/tcp,:::6901->5900/tcp
development_firefox_1 /opt/bin/entry_point.sh Up 0.0.0.0:6902->5900/tcp,:::6902->5900/tcp
development_web_1 flask run --host 0.0.0.0 Up 0.0.0.0:5000->5000/tcp,:::5000->5000/tcp
selenium-hub /opt/bin/entry_point.sh Up 0.0.0.0:4442->4442/tcp,:::4442->4442/tcp,
0.0.0.0:4443->4443/tcp,:::4443->4443/tcp,
0.0.0.0:4444->4444/tcp,:::4444->4444/tcp
I feel like I'm fundamentally missing something here, Any thoughts?

I see the mistake now. I was mistakenly attempting to connect to http://localhost:4444 with my client, when I was needing to specify the network name deployed by selenium grid.
Fix
Change this line in your flask_app.py
driver = webdriver.Remote(command_executor="http://localhost:4444")
To:
driver = webdriver.Remote(command_executor="http://container-name:4444")
Where container-name is the selenium hub name set in docker-compose.yml
selenium-hub:
image: selenium/hub:4.0.0-20211013
container_name: selenium-hub
ports:
- "4442:4442"
- "4443:4443"
- "4444:4444"
in my case: "selenium-hub"
Resource on docker networking used: https://docs.docker.com/compose/networking/
Final Thoughts
I guess I got tripped up by the fact that I am still using http://localhost:port to connect to both the grid hub and web container. I guess the difference is where the client request comes from? From outside the docker stack vs within? Anyway, hope this helps someone.

Commiting new changes to Gunicorn + Nginx + Django dockerized application in server

Docker novice here.
I have committed new changes inside the application. These changes where copied from my local to host machine, and then to docker container.
So I created a new image sudo docker commit old_container_id new_image_name(djangotango-on-docker_web)
Then I spin the docker container by using new image created.
sudo docker run --name djangotango-web -d --expose 8000 djangotango-on-docker_web gunicorn djangotango.wsgi:application --bind 0.0.0.0:8000
Here djangotango-on-docker_web is my new image created.
But my application gives 502 error after this. My new container is not synced properly.
dockerfile
version: '3.8'
# networks:
# public_network:
# name: public_network
# driver: bridge
services:
web:
build:
context: .
dockerfile: Dockerfile.prod
# image: <aws-account-id>.dkr.ecr.<aws-region>.amazonaws.com/django-ec2:web
command: gunicorn djangotango.wsgi:application --bind 0.0.0.0:8000
volumes:
# - .:/home/app/web/
- static_volume:/home/app/web/static
- media_volume:/home/app/web/media
expose:
- 8000
env_file:
- ./.env.staging
networks:
service_network:
db:
image: postgres:12.0-alpine
volumes:
- postgres_data:/var/lib/postgresql/data/
env_file:
- ./.env.staging.db
networks:
service_network:
# depends_on:
# - web
# pgadmin:
# image: dpage/pgadmin4
# env_file:
# - ./.env.staging.db
# ports:
# - "8080:80"
# volumes:
# - pgadmin-data:/var/lib/pgadmin
# depends_on:
# - db
# links:
# - "db:pgsql-server"
# environment:
# - PGADMIN_DEFAULT_EMAIL=4652173624824872
# - PGADMIN_DEFAULT_PASSWORD=exampleeee
# - PGADMIN_LISTEN_PORT=80
# networks:
# service_network:
nginx-proxy:
build: nginx
# image: <aws-account-id>.dkr.ecr.<aws-region>.amazonaws.com/django-ec2:nginx-proxy
restart: always
ports:
- 443:443
- 80:80
networks:
service_network:
volumes:
- static_volume:/home/app/web/static
- media_volume:/home/app/web/media
- certs:/etc/nginx/certs
- html:/usr/share/nginx/html
- vhost:/etc/nginx/vhost.d
- /var/run/docker.sock:/tmp/docker.sock:ro
labels:
- "com.github.jrcs.letsencrypt_nginx_proxy_companion.nginx_proxy"
depends_on:
- web
nginx-proxy-letsencrypt:
image: jrcs/letsencrypt-nginx-proxy-companion
env_file:
- .env.staging.proxy-companion
networks:
service_network:
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- certs:/etc/nginx/certs
- html:/usr/share/nginx/html
- vhost:/etc/nginx/vhost.d
depends_on:
- nginx-proxy
networks:
service_network:
volumes:
postgres_data:
pgadmin-data:
static_volume:
media_volume:
certs:
html:
vhost:
How to do it in correct way? I'm running my production application on my domain name.
What I can understand from logs is, my web is not in same network as other container now.
I don't want to rebuild my docker-compose which will solve the problem but will increase the image size, plus it's not recommended I guess.

The correct approach here is to use only docker-compose commands, and to go ahead and rebuild your image:
docker-compose up --build --force-recreate web
Many of the options you'd need to recreate this with a plain docker run command are listed in the docker-compose.yml file, but some generated implicitly. The docker run command you show doesn't have a --net option to attach to the Compose network (which could result in the error you're getting), and it doesn't have the -v options to overwrite the image's static files with content from a volume or the settings from the .env.staging file.
You should almost never use docker commit either. What's the code change you made in your image, and how would your colleagues get and test that change? Especially with the mentions of "prod" here, running code in production that you haven't built from source and tested through your usual CI process is usually discouraged.
(In terms of image size, a committed image will always be larger than the original image; docker build a new image will start from the base image and generally be smaller. Committing images also tends to lose options like the default command to run.)

Django server inside docker is causing robot tests to give blank screens?

I have built an environment inside docker compose in order to run robot tests. The environment consists of django web app, postgres and robot framework container. The Problem I have is that I get many blank screens in different tests, while using external Django web app instance which is installed on a virtual machine doesn't have this problem.
The blank screen causes that elements are not found hence so many failures:
JavascriptException: Message: javascript error: Cannot read property 'get' of undefined
(Session info: headless chrome=84.0.4147.89)
I am sure that the problem is with the Django app container itself not robot container since as said above I have tested with the same environment but against different web app which is installed outside Docker, and it worked.
docker-compose.yml:
version: "3.6"
services:
redis:
image: redis:3.2
ports:
- 6379
networks:
local:
ipv4_address: 10.0.0.20
smtpd:
image: mysmtpd:1.0.5
ports:
- 25
networks:
- local
postgres:
image: mypostgres
build:
context: ../dias-postgres/
args:
VERSION: ${POSTGRES_TAG:-12}
hostname: "postgres"
environment:
POSTGRES_DB: ${POSTGRES_USER}
POSTGRES_USER: ${POSTGRES_USER}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
networks:
local:
ipv4_address: 10.0.0.100
ports:
- 5432
volumes:
- my-postgres:/var/lib/postgresql/data
app:
image: mypyenv:${PYENV_TAG:-1.1}
tty: true
stdin_open: true
user: ${MY_USER:-jenkins}
networks:
local:
ipv4_address: 10.0.0.50
hostname: "app"
ports:
- 8000
volumes:
- ${WORKSPACE}:/app
environment:
ALLOW_HOST: "*"
PGHOST: postgres
PGUSER: ${POSTGRES_USER}
PGDATABASE: ${POSTGRES_USER}
PGPASSWORD: ${POSTGRES_PASSWORD}
ANONYMIZE: "false"
REDIS_HOST: redis
REDIS_DB: 2
APP_PATH: ${APP_PATH}
APP: ${MANDANT}
TIMER: ${TIMER:-20}
EMAIL_BACKEND: "dias.core.log.mail.SmtpEmailBackend"
EMAIL_HOST: "smtpd"
EMAIL_PORT: "25"
robot:
image: myrobot:${ROBOT_TAG:-1.0.9}
user: ${ROBOT_USER:-jenkins}
networks:
local:
ipv4_address: 10.0.0.70
volumes:
- ${WORKSPACE}:/app
- ${ROBOT_REPORTS_PATH}:/APP_Robot_Reports
environment:
APP_ROBOT: ${APP_ROBOT}
TIMER: ${TIMER:-20}
PGHOST: postgres
PGUSER: ${POSTGRES_USER}
PGDATABASE: ${POSTGRES_USER}
PGPASSWORD: ${POSTGRES_PASSWORD}
THREADS: ${THREADS:-4}
tty: true
stdin_open: true
entrypoint: start-robot
networks:
local:
driver: bridge
ipam:
config:
- subnet: 10.0.0.0/24
volumes:
my-postgres:
external: true
name: my-postgres
I have monitored the app stats and nothing is abnormal during testing. Also, manually tested the app in browser and it looks just good with nothing wrong about it.
Note: There is no mismatch between chromedriver and google chrome version (anyway this doesn't matter since the same robot container has worked with other instance where no Docker is used for the Django app)
Anyone has an idea ?

I didn't focus before that I run pabot with 8 processes while django app was started with 2 celery workers. As soon as I increased celery workers to 4 it worked. Not sure though if this is the actually cause but it made sense to me as well as it worked.
celery -A server -c ${CELERY_CONCURRENCY:-2} worker

python script shows 100% cpu utilization after running in dockerfile

I am dockerizing a python script, and I run it as CMD ['python', 'script.py'], in the Dockerfile. When I up the container using docker-compose.yml, it runs,
But when I docker exec and go inside the container and do a ps -aux, I see the %CPU is 100%, because of this the purpose of the service is not met.
If I do the same process, i.e, by doing a docker exec and run the script python script.py manually in the container, It works good and I can see only a 5% of the CPU is utilized, as well as the service works and gives the expected result.
Service wrote in docker-compose:
consumer:
restart: always
image: consumer:latest
build: ./consumer
ports:
- "8283:8283"
depends_on:
- redis
environment:
- REDIS_HOST = redis
redis:
image: redis
command: redis-server
volumes:
- ./redis_data:/data
ports:
- "6379:6379"
restart: unless-stopped
It is a consumer application, which consumes the message from the producer and writes into a Redis server.
Can someone advice why such behavior is observed.

Cannot connect to Kafka from Flask in a dockerized environement

I'm trying to build a Flask app that has Kafka as an interface. I used a Python connector, kafka-python and a Docker image for Kafka, spotify/kafkaproxy .
Below is the docker-compose file.
version: '3.3'
services:
kafka:
image: spotify/kafkaproxy
container_name: kafka_dev
ports:
- '9092:9092'
- '2181:2181'
environment:
- ADVERTISED_HOST=0.0.0.0
- ADVERTISED_PORT=9092
- CONSUMER_THREADS=1
- TOPICS=PROFILE_CREATED,IMG_RATED
- ZK_CONNECT=kafka7zookeeper:2181/root/path
flaskapp:
build: ./flask-app
container_name: flask_dev
ports:
- '9000:5000'
volumes:
- ./flask-app:/app
depends_on:
- kafka
Below is the Python snippet I used to connect to kafka. Here, I used the Kafka container's alias kafka to connect, as Docker would take care of mapping the alias to it's IP address.
from kafka import KafkaConsumer, KafkaProducer
TOPICS = ['PROFILE_CREATED', 'IMG_RATED']
BOOTSTRAP_SERVERS = ['kafka:9092']
consumer = KafkaConsumer(TOPICS, bootstrap_servers=BOOTSTRAP_SERVERS)
I got NoBrokersAvailable error. From this, I could understand that the Flask app could not find the Kafka server.
Traceback (most recent call last):
File "./app.py", line 11, in <module>
consumer = KafkaConsumer("PROFILE_CREATED", bootstrap_servers=BOOTSTRAP_SERVERS)
File "/usr/local/lib/python3.6/site-packages/kafka/consumer/group.py", line 340, in __init__
self._client = KafkaClient(metrics=self._metrics, **self.config)
File "/usr/local/lib/python3.6/site-packages/kafka/client_async.py", line 219, in __init__
self.config['api_version'] = self.check_version(timeout=check_timeout)
File "/usr/local/lib/python3.6/site-packages/kafka/client_async.py", line 819, in check_version
raise Errors.NoBrokersAvailable()
kafka.errors.NoBrokersAvailable: NoBrokersAvailable
Other Observations:
I was able to run ping kafka from the Flask container and get packets from the Kafka container.
When I run the Flask app locally, trying to connect to the Kafka container by setting BOOTSTRAP_SERVERS = ['localhost:9092'], it works fine.

UPDATE
As mentioned by cricket_007, given that you are using the docker-compose provided below, you should use kafka:29092 to connect to Kafka from another container. So your code would look like this:
from kafka import KafkaConsumer, KafkaProducer
TOPICS = ['PROFILE_CREATED', 'IMG_RATED']
BOOTSTRAP_SERVERS = ['kafka:29092']
consumer = KafkaConsumer(TOPICS, bootstrap_servers=BOOTSTRAP_SERVERS)
END UPDATE
I would recommend you use the Kafka images from Confluent Inc, they have all sorts of example setups using docker-compose that are ready to use and they are always updating them.
Try this out:
---
version: '2'
services:
zookeeper:
image: confluentinc/cp-zookeeper:latest
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
kafka:
image: confluentinc/cp-kafka:latest
depends_on:
- zookeeper
ports:
- 9092:9092
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
flaskapp:
build: ./flask-app
container_name: flask_dev
ports:
- '9000:5000'
volumes:
- ./flask-app:/app
I used this docker-compose.yml and added your service on top
Please note that:
The config used here exposes port 9092 for external connections to the broker i.e. those from outside the docker network. This could be from the host machine running docker, or maybe further afield if you've got a more complicated setup. If the latter is true, you will need to change the value 'localhost' in KAFKA_ADVERTISED_LISTENERS to one that is resolvable to the docker host from those remote clients
Make sure you check out the other examples, may be useful for you especially when moving to production environments: https://github.com/confluentinc/cp-docker-images/tree/5.0.1-post/examples
Also worth checking:
It seems that you need to specify the api_version to avoid this error. For more details check here.
Version 1.3.5 of this library (which is latest on pypy) only lists certain API versions 0.8.0 to 0.10.1. So unless you explicitly specify api_version to be (0, 10, 1) the client library's attempt to discover the version will cause a NoBrokersAvailable error.
producer = KafkaProducer(
bootstrap_servers=URL,
client_id=CLIENT_ID,
value_serializer=JsonSerializer.serialize,
api_version=(0, 10, 1)
)
This should work, interestingly enough setting the api_version is accidentally fixing the issue according to this:
When you set api_version the client will not attempt to probe brokers for version information. So it is the probe operation that is failing. One large difference between the version probe connections and the general connections is that the former only attempts to connect on a single interface per connection (per broker), where as the latter -- general operation -- will cycle through all interfaces continually until a connection succeeds. #1411 fixes this by switching the version probe logic to attempt a connection on all found interfaces.
The actual issue is described here

I managed to get this up-and-running using a network named stream_net between all services.
# for local development
version: "3.7"
services:
zookeeper:
image: confluentinc/cp-zookeeper:latest
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
networks:
- stream_net
kafka:
image: confluentinc/cp-kafka:latest
depends_on:
- zookeeper
ports:
- 9092:9092
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
networks:
- stream_net
flaskapp:
build: ./flask-app
container_name: flask_dev
ports:
- "9000:5000"
volumes:
- ./flask-app:/app
networks:
- stream_net
depends_on:
- kafka
networks:
stream_net:
connection from outside the containers on localhost:9092
connection within the network on kafka:29092
of course it is strange to put all containers that are already running within a network within a network. But in this way the containers can be named by their actual name. Maybe someone can explain exactly how this works, or it helps someone else to understand the core of the problem and to solve it properly.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python, Kafka and Docker - KafkaConsumer keeps hanging - python

Related

Trouble Remotely Connecting Flask App to Selenium Grid using Docker

Commiting new changes to Gunicorn + Nginx + Django dockerized application in server

Django server inside docker is causing robot tests to give blank screens?

python script shows 100% cpu utilization after running in dockerfile

Cannot connect to Kafka from Flask in a dockerized environement

Categories

Resources