Load balance docker swarm - python

I have a docker swarm mode with one HAProxy container, and 3 python web apps. The container with HAProxy is expose port 80 and should load balance the 3 containers of my app (by leastconn).
Here is my docker-compose.yml file:
version: '3'
services:
scraper-node:
image: scraper
ports:
- 5000
volumes:
- /profiles:/profiles
command: >
bash -c "
cd src;
gunicorn src.interface:app \
--bind=0.0.0.0:5000 \
--workers=1 \
--threads=1 \
--timeout 500 \
--log-level=debug \
"
environment:
- SERVICE_PORTS=5000
deploy:
replicas: 3
update_config:
parallelism: 5
delay: 10s
restart_policy:
condition: on-failure
max_attempts: 3
window: 120s
networks:
- web
proxy:
image: dockercloud/haproxy
depends_on:
- scraper-node
environment:
- BALANCE=leastconn
volumes:
- /var/run/docker.sock:/var/run/docker.sock
ports:
- 80:80
networks:
- web
networks:
web:
driver: overlay
When I deploy this swarm (docker stack deploy --compose-file=docker-compose.yml scraper) I get all of my containers:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
245f4bfd1299 scraper:latest "/docker-entrypoin..." 21 hours ago Up 19 minutes 80/tcp, 5000/tcp, 8000/tcp scraper_scraper-node.3.iyi33hv9tikmf6m2wna0cypgp
995aefdb9346 scraper:latest "/docker-entrypoin..." 21 hours ago Up 19 minutes 80/tcp, 5000/tcp, 8000/tcp scraper_scraper-node.2.wem9v2nug8wqos7d97zknuvqb
a51474322583 scraper:latest "/docker-entrypoin..." 21 hours ago Up 19 minutes 80/tcp, 5000/tcp, 8000/tcp scraper_scraper-node.1.0u8q4zn432n7p5gl93ohqio8e
3f97f34678d1 dockercloud/haproxy "/sbin/tini -- doc..." 21 hours ago Up 19 minutes 80/tcp, 443/tcp, 1936/tcp scraper_proxy.1.rng5ysn8v48cs4nxb1atkrz73
And when I display the haproxy container log it looks like he recognize the 3 python containers:
INFO:haproxy:dockercloud/haproxy 1.6.6 is running outside Docker Cloud
INFO:haproxy:Haproxy is running in SwarmMode, loading HAProxy definition through docker api
INFO:haproxy:dockercloud/haproxy PID: 6
INFO:haproxy:=> Add task: Initial start - Swarm Mode
INFO:haproxy:=> Executing task: Initial start - Swarm Mode
INFO:haproxy:==========BEGIN==========
INFO:haproxy:Linked service: scraper_scraper-node
INFO:haproxy:Linked container: scraper_scraper-node.1.0u8q4zn432n7p5gl93ohqio8e, scraper_scraper-node.2.wem9v2nug8wqos7d97zknuvqb, scraper_scraper-node.3.iyi33hv9tikmf6m2wna0cypgp
INFO:haproxy:HAProxy configuration:
global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
log-send-hostname
maxconn 4096
pidfile /var/run/haproxy.pid
user haproxy
group haproxy
daemon
stats socket /var/run/haproxy.stats level admin
ssl-default-bind-options no-sslv3
ssl-default-bind-ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:AES128-GCM-SHA256:AES128-SHA256:AES128-SHA:AES256-GCM-SHA384:AES256-SHA256:AES256-SHA:DHE-DSS-AES128-SHA:DES-CBC3-SHA
defaults
balance leastconn
log global
mode http
option redispatch
option httplog
option dontlognull
option forwardfor
timeout connect 5000
timeout client 50000
timeout server 50000
listen stats
bind :1936
mode http
stats enable
timeout connect 10s
timeout client 1m
timeout server 1m
stats hide-version
stats realm Haproxy\ Statistics
stats uri /
stats auth stats:stats
frontend default_port_80
bind :80
reqadd X-Forwarded-Proto:\ http
maxconn 4096
default_backend default_service
backend default_service
server scraper_scraper-node.1.0u8q4zn432n7p5gl93ohqio8e 10.0.0.5:5000 check inter 2000 rise 2 fall 3
server scraper_scraper-node.2.wem9v2nug8wqos7d97zknuvqb 10.0.0.6:5000 check inter 2000 rise 2 fall 3
server scraper_scraper-node.3.iyi33hv9tikmf6m2wna0cypgp 10.0.0.7:5000 check inter 2000 rise 2 fall 3
INFO:haproxy:Launching HAProxy
INFO:haproxy:HAProxy has been launched(PID: 12)
INFO:haproxy:===========END===========
But when I try to GET to http://localhost I get an error message:
<html>
<body>
<h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body>
</html>

There was two problems:
The command in docker-compose.yml file should be one line.
The scraper image should expose port 5000 (in his Dockerfile).
Once I fix those, I deploy this swarm the same way (with stack) and the proxy container recognize the python containers and was able to load balance between them.

A 503 error usually means a failed health check to the backend server.
Your stats page might be helpful here: if you mouse over the LastChk column of one of your DOWN backend servers, HAProxy will give you a vague summary of why that server is DOWN:
It does not look like you configured the health check (option httpchk) for your default_service backend: can you reach any of your backend servers directly (e.g. curl --head 10.0.0.5:5000)? From the HAProxy documentation:
[R]esponses 2xx and 3xx are
considered valid, while all other ones indicate a server failure, including
the lack of any response.

Related

Using pika, how to connect to rabbitmq running in docker, started with docker-compose with external network?

I have the following docker-compose file:
version: '2.3'
networks:
default: { external: true, name: $NETWORK_NAME } # NETWORK_NAME in .env file is `uv_atp_network`.
services:
car_parts_segmentor:
# container_name: uv-car-parts-segmentation
image: "uv-car-parts-segmentation:latest"
ports:
- "8080:8080"
volumes:
- ../../../../uv-car-parts-segmentation/configs:/uveye/configs
- /isilon/:/isilon/
# - local_data_folder:local_data_folder
command: "--run_service rabbit"
runtime: nvidia
depends_on:
rabbitmq_local:
condition: service_started
links:
- rabbitmq_local
restart: always
rabbitmq_local:
image: 'rabbitmq:3.6-management-alpine'
container_name: "rabbitmq"
ports:
- ${RABBIT_PORT:?unspecified_rabbit_port}:5672
- ${RABBIT_MANAGEMENT_PORT:?unspecified_rabbit_management_port}:15672
When this runs, docker ps shows
21400efd6493 uv-car-parts-segmentation:latest "python /uveye/app/m…" 5 seconds ago Up 1 second 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp joint_car_parts_segmentor_1
bf4ab8581f1f rabbitmq:3.6-management-alpine "docker-entrypoint.s…" 5 seconds ago Up 4 seconds 4369/tcp, 5671/tcp, 0.0.0.0:5672->5672/tcp, :::5672->5672/tcp, 15671/tcp, 25672/tcp, 0.0.0.0:15672->15672/tcp, :::15672->15672/tcp rabbitmq
I want to create a connection to that rabbitmq. The user:pass is guest:guest.
I was unable to do it, with the very uninformative AMQPConnectionError in all cases:
Below code runs in another, unrelated container.
connection = pika.BlockingConnection(pika.URLParameters("amqp://guest:guest#rabbitmq/"))
connection = pika.BlockingConnection(pika.URLParameters("amqp://guest:guest#localhost/"))
Also tried with
$ docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' rabbitmq
172.27.0.2
and
connection = pika.BlockingConnection(pika.URLParameters("amqp://guest:guest#172.27.0.2/")) #
Also tried with
credentials = pika.credentials.PlainCredentials(
username="guest",
password="guest"
)
parameters = pika.ConnectionParameters(
host=ip_address, # tried all above options
port=5672,
credentials=credentials,
heartbeat=10,
)
Note that the container car_parts_segmentor is able to see the container rabbitmq. Both are started by docker-compose.
My assumption is this has to do with the uv_atp_network both containers live in, and I am trying to access a docker inside that network, from outside the network.
Is this really the problem?
If so, how can this be achieved?
For the future - how to get more informative errors from pika?
As I suspected, the problem was the name rabbitmq existed only in the network uv_atp_network.
The code attempting to connect to that network runs inside a container of its own, which was not present in the network.
Solution connect the current container to the network:
import socket
client = docker.from_env()
network_name = "uv_atp_network"
atp_container = client.containers.get(socket.gethostname())
client.networks.get(network_name).connect(container=atp_container.id)
After this, the above code in the question does work, because rabbitmq can be resolved.
connection = pika.BlockingConnection(pika.URLParameters("amqp://guest:guest#rabbitmq/"))

Unable to use pika with official hello-world example [duplicate]

I have the following docker-compose file:
version: '2.3'
networks:
default: { external: true, name: $NETWORK_NAME } # NETWORK_NAME in .env file is `uv_atp_network`.
services:
car_parts_segmentor:
# container_name: uv-car-parts-segmentation
image: "uv-car-parts-segmentation:latest"
ports:
- "8080:8080"
volumes:
- ../../../../uv-car-parts-segmentation/configs:/uveye/configs
- /isilon/:/isilon/
# - local_data_folder:local_data_folder
command: "--run_service rabbit"
runtime: nvidia
depends_on:
rabbitmq_local:
condition: service_started
links:
- rabbitmq_local
restart: always
rabbitmq_local:
image: 'rabbitmq:3.6-management-alpine'
container_name: "rabbitmq"
ports:
- ${RABBIT_PORT:?unspecified_rabbit_port}:5672
- ${RABBIT_MANAGEMENT_PORT:?unspecified_rabbit_management_port}:15672
When this runs, docker ps shows
21400efd6493 uv-car-parts-segmentation:latest "python /uveye/app/m…" 5 seconds ago Up 1 second 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp joint_car_parts_segmentor_1
bf4ab8581f1f rabbitmq:3.6-management-alpine "docker-entrypoint.s…" 5 seconds ago Up 4 seconds 4369/tcp, 5671/tcp, 0.0.0.0:5672->5672/tcp, :::5672->5672/tcp, 15671/tcp, 25672/tcp, 0.0.0.0:15672->15672/tcp, :::15672->15672/tcp rabbitmq
I want to create a connection to that rabbitmq. The user:pass is guest:guest.
I was unable to do it, with the very uninformative AMQPConnectionError in all cases:
Below code runs in another, unrelated container.
connection = pika.BlockingConnection(pika.URLParameters("amqp://guest:guest#rabbitmq/"))
connection = pika.BlockingConnection(pika.URLParameters("amqp://guest:guest#localhost/"))
Also tried with
$ docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' rabbitmq
172.27.0.2
and
connection = pika.BlockingConnection(pika.URLParameters("amqp://guest:guest#172.27.0.2/")) #
Also tried with
credentials = pika.credentials.PlainCredentials(
username="guest",
password="guest"
)
parameters = pika.ConnectionParameters(
host=ip_address, # tried all above options
port=5672,
credentials=credentials,
heartbeat=10,
)
Note that the container car_parts_segmentor is able to see the container rabbitmq. Both are started by docker-compose.
My assumption is this has to do with the uv_atp_network both containers live in, and I am trying to access a docker inside that network, from outside the network.
Is this really the problem?
If so, how can this be achieved?
For the future - how to get more informative errors from pika?
As I suspected, the problem was the name rabbitmq existed only in the network uv_atp_network.
The code attempting to connect to that network runs inside a container of its own, which was not present in the network.
Solution connect the current container to the network:
import socket
client = docker.from_env()
network_name = "uv_atp_network"
atp_container = client.containers.get(socket.gethostname())
client.networks.get(network_name).connect(container=atp_container.id)
After this, the above code in the question does work, because rabbitmq can be resolved.
connection = pika.BlockingConnection(pika.URLParameters("amqp://guest:guest#rabbitmq/"))

Cannot create local docker-compose file for Confluent Cloud Setup

I want to create a local kafka setup using docker-compose that replicates very closely the secured kafka setup in confluent cloud.
The cluster I have in Confluent Cloud can be connected to using
c = Consumer(
{
"bootstrap.servers": "broker_url",
"sasl.mechanism": "PLAIN",
"security.protocol": "SASL_SSL",
"sasl.username": "key",
"sasl.password": "secret",
"group.id": "consumer-name",
}
)
But I am unable to create a docker-compose.yml locally that has the same config and can be connected to using the same code.
version: '3'
services:
zookeeper:
image: confluentinc/cp-zookeeper:6.2.0
ports:
- "2181:2181"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
kafka:
image: confluentinc/cp-kafka:6.2.0
depends_on:
- zookeeper
ports:
- '9092:9092'
- '19092:19092'
expose:
- '29092'
environment:
KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
KAFKA_LISTENERS: INSIDE-DOCKER-NETWORK://0.0.0.0:29092,OTHER-DOCKER-NETWORK://0.0.0.0:19092,HOST://0.0.0.0:9092
KAFKA_ADVERTISED_LISTENERS: INSIDE-DOCKER-NETWORK://kafka:29092,OTHER-DOCKER-NETWORK://host.docker.internal:19092,HOST://localhost:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE-DOCKER-NETWORK:PLAINTEXT,OTHER-DOCKER-NETWORK:PLAINTEXT,HOST:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE-DOCKER-NETWORK
KAFKA_AUTO_CREATE_TOPICS_ENABLE: 'true'
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
# Allow to swiftly purge the topics using retention.ms
KAFKA_LOG_RETENTION_CHECK_INTERVAL_MS: 100
# Security Stuff
KAFKA_LISTENER_NAME_EXTERNAL_PLAIN_SASL_JAAS_CONFIG: |
org.apache.kafka.common.security.plain.PlainLoginModule required \
username="broker" \
password="broker" \
user_alice="alice-secret";
KAFKA_SASL_ENABLED_MECHANISMS: PLAIN
KAFKA_SASL_MECHANISM_INTER_BROKER_PROTOCOL: SASL_SSL
Here is what I have in terms of the local docker-compose file but its not working
This is the error I get when I try connecting using the same code
%3|1628019607.757|FAIL|rdkafka#consumer-1| [thrd:sasl_ssl://localhost:9092/bootstrap]: sasl_ssl://localhost:9092/bootstrap: SSL handshake failed: Disconnected: connecting to a PLAINTEXT broker listener? (after 9ms in state SSL_HANDSHAKE)
Here's your hint: Disconnected: connecting to a PLAINTEXT broker listener?
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP only has PLAINTEXT mappings, so there is no SASL_SSL connection that your client can use
For what it looks like you did configure to have SASL_SSL, you only have one broker, so KAFKA_SASL_MECHANISM_INTER_BROKER_PROTOCOL doesn't really do anything
In this demo repo you can find brokers that use all possible protocol mappings

Where to check swarm load balancer logs?

I have docker compose file as
---
version: '3.7'
services:
myapi:
image: tiangolo/uwsgi-nginx-flask:python3.7
env_file: apivars.env
logging:
driver: syslog
options:
syslog-address: "udp://127.0.0.1:514"
tag: tags
labels: labels
ports:
- "8080:80"
deploy:
placement:
constraints:
- node.role != manager
mode: replicated
replicas: 32
update_config:
parallelism: 4
delay: 5s
order: start-first
...
I have load balancer which will redirect request to this swarm manager.
My understanding is, if I hit www.myapi.com, it will got to LoadBalancer and then request will go to swarm manager, then swarm manager send that request to on of the 32 replicas.
Now issue is, LoadBalancer logs report some of the 502 errors.
# head -n1 /var/log/haproxy.log
Apr 28 09:35:28 localhost haproxy[43117]: 172.19.9.1:50220 [28/Apr/2020:09:35:08.549] main~ API_Production/swarmnode5 0/0/1/19952/19953 502 309 - - ---- 97/97/10/1/0 0/0 "GET /v2/students/?includeFields=name,id&per_page=1000&page=88 HTTP/1.1"
I have to check its reaching to swarm manager or swarmnode5 ?
I check the logs for nginx, but its not report any 502 errors. There are some exceptions, but not sure if exception is there in code, then why nginx not logs that api call and response?

Dockerized app succesfully deployed but times out on warm up and

I have a dockerized app which I deployed on Azure app services perfectly fine. I then made some changes to the html file, pushed the new version to dockerhub, and finally set azure to pull the latest tag. However, now the app stopped working all together. I deleted the azure resources and services and attempted to recreate them. Here is that process:
The log stream for the container looks like this:
2020-01-18T20:21:10.830911857Z * Serving Flask app "main" (lazy loading)
2020-01-18T20:21:10.834294374Z * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
2020-01-18 20:31:41.719 INFO - Pulling image from Docker hub: gdeol4/azure-ml:version-8
2020-01-18 20:31:41.837 INFO - version-8 Pulling from xyz
2020-01-18 20:31:41.839 INFO - Digest: sha256:xyz
2020-01-18 20:31:41.840 INFO - Status: Image is up to date for xyz
2020-01-18 20:31:41.845 INFO - Pull Image successful, Time taken: 0 Minutes and 0 Seconds
2020-01-18 20:31:41.858 INFO - Starting container for site docker run -d -p 2113:80
-e WEBSITES_ENABLE_APP_SERVICE_STORAGE=false
-e PORT=80
-e WEBSITE_ROLE_INSTANCE_ID=0
-e HTTP_LOGGING_ENABLED=1
2020-01-18 20:31:42.337 INFO - Initiating warmup request to container
2020-01-18 20:35:32.524 ERROR - Container xyz for site xyz did not start within expected time limit. Elapsed time = 230.1865177 sec
2020-01-18 20:35:32.525 ERROR - Container xyz didn't respond to HTTP pings on port: 80, failing site start. See container logs for debugging.
2020-01-18 20:35:32.535 INFO - Stoping site because it failed during startup.
in this case the solution was to use the WEBSITES_PORT application setting and set it to 5000 (the port used by the app to listen to requests).

Categories