I receive an error similar to this when trying to start ray using a docker image from a local repository
The proposed solutions involve making some changes to the nodes' docker config files, and then restarting the docker daemon.
However, I believe that (please correct if I'm wrong) the setup commands in ray config files are run within the docker containers, rather than directly on the node machines. So I'm unsure of how to apply them when using Ray.
How can I avoid the error?
Related
I am attempting to create a simple producer and consumer with two Python scripts, using Kafka deployed on Microk8s. However, when running the producer.py script, I get the following error on repeat:
...|FAIL|rdkafka#producer-1| [thrd:...:9092/bootstrap]: ...:9092/bootstrap: Connect to ipv4#localhost:9092 failed: Connection refused (after 0ms in state CONNECT, ... identical error(s) suppressed
I am fairly confident that this issue is a result of the listeners not being configured correctly, but I have so far been unable to figure out what I need to do to fix them, due to what I assume is my complete lack of any knowledge in this area. I have reviewed these resources, in addition to several others from this site, but have been unable to find a solution, or at least a solution I can understand enough to act upon.
Steps to Reproduce:
The Python scripts to generate the producer and consumer can be found here.
For Microk8s installation, I followed these instructions. I also installed Helm, since my project requirements dictate that I use Helm charts.
I then installed Kafka using:
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install kafka-release bitnami/kafka
The Python code in the linked post uses 'localhost:9092', as the error also shows - Connect to ipv4#localhost:9092 failed
If you are trying to run that code in a k8s pod, then you need to give the external broker DNS addresses, not the local pod address.
If you run the Python code from outside the k8s cluster, you need to expose a ClusterIP / NodePort external service or Ingress (as the linked Strimzi post shows; plus, you can can still use Strimzi Operator with Helm, so you don't really need the Bitnami Charts).
At a high level, the advertisted.listeners tells clients how to connect to a specific broker. If you advertise localhost, the pod will try to connect to itself, even if the bootstrap connection worked (setup by just listeners). If you advertise kafka.svc.cluster.local, then it will try to connect to the kafka service in the default namespace... But you still need to actually set boostrap.servers = kafka.svc.cluster.local:9092, for example.
This may be a sort of 101 question, but in setting this up for the first time there are no hints about such a fundamental and common task. Basically I have a headless ubuntu running as a docker image inside AWS, which gets built via github actions CI/CD. All is running well.
Inside ubuntu I have some python scripts, let's say a custom server, cron jobs, some software running etc. How can I know, remotely, if there were any errors logged by any of these? Let's keep it simple: How can I print an error message, from a python server inside ubuntu, that I can read from outside docker? Does AWS have any kind of web interface for viewing stdout/stderr logs? Or at least an ssh console? Any examples somewhere?
Furthermore, I've set up my docker with healthchecks, to confirm that my servers running inside ubuntu are online and serving. Those work because I can test them in localhost by doing docker ps and shows Status 'healthy'. How do I see this same thing when live in AWS?
Have I really missed something this big? It feels like this should be the first thing flashing on the main page of setting up a docker on AWS.
There's a few things to unpack here, that you learn after digging through a lot of stuff you don't need in order to get started, just so you can know how to get started.
Docker will log by default the output of the startup processes that you've described your dockerfile setup, e.g. when you do ENTRYPOINT bash -C /home/ubuntu/my_dockerfile_sh_scripts/myStartupScripts.sh. If any subproceses spawned by those processes also log to stdout/stderr, the messages should bubble up to the host process, and therefore be shown in the docker log. If they don't bubble, look up subprocess stdout/stderr in linux.
Ok we know that, but where the heck is AWS's stats and logs page? Well in Amazon Cloudwatch™ of course. Didn't you already know about that term? Why, it says so right there when you create a docker, or on your ECS console next to your docker Clusters, or next to your running docker image Service. OH WAIT! No, no it does not! There is no utterance of "Cloudwatch" anywhere. Well there is this one page that has "Cloudwatch" on it, which you can get to if you know the url, but hey look at that, you don't actually see any sort of logs coming from your code in docker anywhere on there so ..yeah. So where do you see your actual logs and output? There is this Logs tab, in your Service's page (the page of the currently running docker image): https://eu-central-1.console.aws.amazon.com/ecs/home?region=eu-central-1#/clusters/your-cluster-name/services/your-cluster-docker-image-service/logs. This generically named and not-described tab, shows a log not of some status for the service, from the AWS side, but actually shows you the docker logs I mentioned in point 1. Ok. How do I view this as a raw file or access this remotely via script? Well I don't know. I guess you'll find out about that basic common task, after reading a couple of manuals about setting up the AWS CLI (another thing you didn't know existed).
Like I said in point 1, docker cannot log generic operating system log messages, or show you log files generated by your server, or just other software or jobs that are running that weren't described and started by your dockerfile/config. So how do we get AWS to see that? Well, It's a bit of a pain in the ass, you have to either replace your docker image's default OS's (e.g. ubuntu) logging driver with sudo yum install -y awslogs and set that up, or you can create symbolic links between specific log files and the stdout/stderr stream (docker docs mention of this). Also check Mark B's answer. But probably the easiest thing is to write your own little scripts with short messages that print out to the main process what's the status of things. Usually that's all you need unless you're an enterprise.
Is there any ssh or otherwise an AWS online command line interface page into the running docker, like you get in your localhost docker desktop? So you could maybe cd and ls browse or search for files and see if everything's fine? No. Make your own. Or better yet, avoid needing that in the first place, even though it's inconvenient for R&D.
Healthchecks. Where the heck do I see my docker healthchecks? The equivalent to the localhost method of just running the docker ps command. Well by default there aren't any healthchecks shown anywhere on AWS. Why would you need healthchecks anyway? So what if your dockerfile has HEALTHCHECKs defined?..🙂 You have to set that up in Fargate™ (..whatever fargate even means cause the name's not written anywhere ("UX")). You have to create what is called a new Task Definition Revision. Go to your Clusters in Amazon ECS. Go to your cluster. Then you click on your Service's entry in the Task Definition column of the services table on the bottom. You click on Create New Revision (new task definition revision). On the new page you click on your container in the Container Definitions table. On the new page you scroll down to HEALTHCHECK, bingo! Now what is this? What commands to I paste in here? It's not automatically taking the HEALTHCHECK that I defined in my dockerfile, so does that mean I must write something else here?? What environment are the healthchecks even run in? Is it my docker? Is it linux? Here's the answer: you paste in this box, what you already wrote in your dockerfile's HEALTHCHECK. Just use http://127.0.0.1 (localhost) as you would in your local docker desktop testing environment. Now click Update. Click Create. K, now we're still not done. Go back to your Amazon ECS / Clusters / cluster. Click on your service name in the services table. Click Update. Select the latest Revision. Check "force new deployment". Then keep clicking Next until finally you click Update Service. You can also define what triggers your image to be shut down on healthcheck fail. For example if it ran out of ram. Now #Amazon, I hope you take this answer and staple it to your shitty ass ECS experience.
I swear the relentlessly, exclusively bottom-up UX of platforms like AWS and Azure are what is keeping the tutorial blogger industry alive.. How would I know what AWS CloudWatch is, or that it even exists? There are no hints about these things anywhere while you set up. You'd think the first thing that flashes on your screen after you completed a docker setup would be "hey, 99.9% of people right now need to set up logging. You should use cloudwatch. And here's how you connect healthchecks to cloudwatch". But no, of course not..! 🙃
Instead, AWS's "engineer" approach here seems to be: here's a grid of holes in the wall, and here's a mess of wires next to it in a bucket. Now in order to do the common frequently done tasks you want to do, you must first read the manual for each hole, and the manual for each wire in the bucket, then find all of the holes and wires you need, and plug them in the right order (and for the right order you need to find a blog post because that always involves some level of not following the docs and definitely also magic).
I guess it's called "job security" for if you're an enterprise server engineer :)
I faced the same issue, I found the AWS Wiki, the /dev/stdout symbolic link doesn't work to me, but /proc/1/fd/1 symbolic link works to me.
Here is the solution:
Step 1. Add those commands to your Dockerfile.
# forward logs to docker log collector
RUN ln -sf /proc/1/fd/1 /var/log/console.log \
&& ln -sf /proc/1/fd/2 /var/log/error.log
Step 2. refer to "Mark B"'s step2.
Step 1. Update your docker image by deleting all the log files you care about, and replacing them with symbolic links to stdout or stderr, for example to capture logs in an nginx container I may do the following in the Dockerfile:
RUN ln -sf /dev/stdout /var/log/nginx/access.log \
&& ln -sf /dev/stderr /var/log/nginx/error.log
Step 2. Configure the awslogs driver in the ECS Task Definition, like so:
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "my-log-group",
"awslogs-region": "my-aws-region",
"awslogs-stream-prefix": "my-log-prefix"
}
}
And as long as you gave the ECS Execution Role permission to write to AWS Logs, log data will start appearing in CloudWatch Logs.
I am trying to build a simple python based docker container. I am working at a corporate behind a proxy, on Windows 10. Below is my docker file:
FROM python:3.7.9-alpine3.11
WORKDIR ./
RUN pip install --proxy=http://XXXXXXX:8080 -r requirements.txt
COPY . /
EXPOSE 5000
CMD ["python", "application.py"]
But it's giving me the following errors in cmd :
"failed to solve with frontend dockerfile.v0: failed to build LLB: failed to load cache key: failed to do request: Head https://registry-1.docker.io/v2/library/python/manifests/3.7.9-alpine3.11: proxyconnect tcp: EOF"
I've tried to figure out how to configure docker's proxy, using many links but they keep referring to a file "/etc/sysconfig/docker" which I cannot find anywhere under Windows 10 or maybe I'm not looking at the right place.
Also I'm not sure this is only a proxy issue since I've seen people running into this issue without using a proxy.
I would highly appreciate anyone's help. Working at this corporate already made me spend >10 hours doing something that took me 10 minutes to do on my Mac... :(
Thank you
You're talking about the most basic of Docker functionality. Normally, it has to connect to the Docker Hub on the internet to get base images. If you can't make this work with your proxy, you can either
preload your local cache with the necessary images
set up a Docker registry inside your firewall that contains all the images you'll need
Obviously, the easiest thing, probably by far, would be to figure out how to get Docker to connect to Docker Hub through your proxy.
In terms of getting Docker on Windows to work with your proxy, might this help? - https://learn.microsoft.com/en-us/virtualization/windowscontainers/manage-docker/configure-docker-daemon
Here's what it says about configuring a proxy:
To set proxy information for docker search and docker pull, create a Windows environment variable with the name HTTP_PROXY or HTTPS_PROXY, and a value of the proxy information. This can be completed with PowerShell using a command similar to this:
In PowerShell:
[Environment]::SetEnvironmentVariable("HTTP_PROXY", "http://username:password#proxy:port/", [EnvironmentVariableTarget]::Machine)
Once the variable has been set, restart the Docker service.
In PowerShell:
Restart-Service docker
For more information, see Windows Configuration File on Docker.com.
I've also seen it mentioned that Docker for Windows allows you to set proxy parameters in its configuration GUI interface.
There is no need to pass proxy information in the Dockerfile.
There are predefined ARGs which can be used for this purpose.
HTTP_PROXY
HTTPS_PROXY
FTP_PROXY
You can pass the details when building the image
https://docs.docker.com/engine/reference/builder/#predefined-args
I do not see any run time dependency of your container on the Internet. So running the container will work without an issue.
I am running a pytorch training on CycleGan inside a Docker image.
I want to use visdom to show the progress of the training (also recommended from the CycleGan project).
I can start a visdom.server inside the docker container and access it outside of the container. But when I try to use the basic example on visdom inside a bash session, of the same container that is running the visdom.server. I get connection refused errors such as The requested URL could not be retrieved.
I think I need to configure the visdom.Visdom() in the example in some custom way to be able to send the data to the server.
Thankful for any help!
Notes
When I start visdom.server it says You can navigate to http://c4b7a2be26c4:8097, when all the examples mentions localhost:8097.
I am trying to do this behind a proxy.
I realised that, in order to curl localhost:8097, I need to use curl --noproxy localhost, localhost:8097. So I will have to do something similar inside visdom.
When setting http_proxy inside a docker container, you need to set no_proxy=localhost, 127.0.0.1 as well in order to allow connections to local host.
Got the same problem, And I found when you use a docker container to connect server, then you can not use the same docker container to run you code
I am creating Python code that will be built into a docker image.
My intent is that the docker image will have the capability of running other docker images on the host.
Let's call these docker containers "daemon" and "workers," respectively.
I've proven that this concept works by running "daemon" using
-v /var/run/docker.sock:/var/run/docker.sock
I'd like to be able to write the code so that it will work anywhere that there exists a /var/run/docker.sock file.
Since I'm working on an OSX machine I have to use the Docker Quickstart terminal. As such, on my system there is no docker.sock file.
The docker-py documentation shows this as the way to capture the docker client:
from docker import Client
cli = Client(base_url='unix://var/run/docker.sock')
Is there some hackery I can do on my system so that I can instantiate the docker client that way?
Can I create the docker.sock file on my file system and have it sym-linked to the VM docker host?
I really don't want to have to build my docker image every time I was to test a single line code change... help!!!