I am using pg_cron to schedule a task which should be repeated every 1 hour.
I have installed and using this inside a docker environment inside the postgres container.
And I am calling the query to create this job using python from a different container.
I can see that job is created successfully but is not being executed due to lack of permission since the pg_hba.conf is not set to trust or due to no .pgpass file.
But if I enable any of those both, anyone can enter into database by using docker exec and do psql in the container.
Is there anyway to avoid this security issue??? Since in production environment it should not be allowed for anyone to enter into the database without a password.
Either keep people from running docker exec on the container or use something else than pg_cron.
I would feel nervous if random people were allowed to run docker exec on the container with my database or my job scheduler in it.
Related
This may be a sort of 101 question, but in setting this up for the first time there are no hints about such a fundamental and common task. Basically I have a headless ubuntu running as a docker image inside AWS, which gets built via github actions CI/CD. All is running well.
Inside ubuntu I have some python scripts, let's say a custom server, cron jobs, some software running etc. How can I know, remotely, if there were any errors logged by any of these? Let's keep it simple: How can I print an error message, from a python server inside ubuntu, that I can read from outside docker? Does AWS have any kind of web interface for viewing stdout/stderr logs? Or at least an ssh console? Any examples somewhere?
Furthermore, I've set up my docker with healthchecks, to confirm that my servers running inside ubuntu are online and serving. Those work because I can test them in localhost by doing docker ps and shows Status 'healthy'. How do I see this same thing when live in AWS?
Have I really missed something this big? It feels like this should be the first thing flashing on the main page of setting up a docker on AWS.
There's a few things to unpack here, that you learn after digging through a lot of stuff you don't need in order to get started, just so you can know how to get started.
Docker will log by default the output of the startup processes that you've described your dockerfile setup, e.g. when you do ENTRYPOINT bash -C /home/ubuntu/my_dockerfile_sh_scripts/myStartupScripts.sh. If any subproceses spawned by those processes also log to stdout/stderr, the messages should bubble up to the host process, and therefore be shown in the docker log. If they don't bubble, look up subprocess stdout/stderr in linux.
Ok we know that, but where the heck is AWS's stats and logs page? Well in Amazon Cloudwatch™ of course. Didn't you already know about that term? Why, it says so right there when you create a docker, or on your ECS console next to your docker Clusters, or next to your running docker image Service. OH WAIT! No, no it does not! There is no utterance of "Cloudwatch" anywhere. Well there is this one page that has "Cloudwatch" on it, which you can get to if you know the url, but hey look at that, you don't actually see any sort of logs coming from your code in docker anywhere on there so ..yeah. So where do you see your actual logs and output? There is this Logs tab, in your Service's page (the page of the currently running docker image): https://eu-central-1.console.aws.amazon.com/ecs/home?region=eu-central-1#/clusters/your-cluster-name/services/your-cluster-docker-image-service/logs. This generically named and not-described tab, shows a log not of some status for the service, from the AWS side, but actually shows you the docker logs I mentioned in point 1. Ok. How do I view this as a raw file or access this remotely via script? Well I don't know. I guess you'll find out about that basic common task, after reading a couple of manuals about setting up the AWS CLI (another thing you didn't know existed).
Like I said in point 1, docker cannot log generic operating system log messages, or show you log files generated by your server, or just other software or jobs that are running that weren't described and started by your dockerfile/config. So how do we get AWS to see that? Well, It's a bit of a pain in the ass, you have to either replace your docker image's default OS's (e.g. ubuntu) logging driver with sudo yum install -y awslogs and set that up, or you can create symbolic links between specific log files and the stdout/stderr stream (docker docs mention of this). Also check Mark B's answer. But probably the easiest thing is to write your own little scripts with short messages that print out to the main process what's the status of things. Usually that's all you need unless you're an enterprise.
Is there any ssh or otherwise an AWS online command line interface page into the running docker, like you get in your localhost docker desktop? So you could maybe cd and ls browse or search for files and see if everything's fine? No. Make your own. Or better yet, avoid needing that in the first place, even though it's inconvenient for R&D.
Healthchecks. Where the heck do I see my docker healthchecks? The equivalent to the localhost method of just running the docker ps command. Well by default there aren't any healthchecks shown anywhere on AWS. Why would you need healthchecks anyway? So what if your dockerfile has HEALTHCHECKs defined?..🙂 You have to set that up in Fargate™ (..whatever fargate even means cause the name's not written anywhere ("UX")). You have to create what is called a new Task Definition Revision. Go to your Clusters in Amazon ECS. Go to your cluster. Then you click on your Service's entry in the Task Definition column of the services table on the bottom. You click on Create New Revision (new task definition revision). On the new page you click on your container in the Container Definitions table. On the new page you scroll down to HEALTHCHECK, bingo! Now what is this? What commands to I paste in here? It's not automatically taking the HEALTHCHECK that I defined in my dockerfile, so does that mean I must write something else here?? What environment are the healthchecks even run in? Is it my docker? Is it linux? Here's the answer: you paste in this box, what you already wrote in your dockerfile's HEALTHCHECK. Just use http://127.0.0.1 (localhost) as you would in your local docker desktop testing environment. Now click Update. Click Create. K, now we're still not done. Go back to your Amazon ECS / Clusters / cluster. Click on your service name in the services table. Click Update. Select the latest Revision. Check "force new deployment". Then keep clicking Next until finally you click Update Service. You can also define what triggers your image to be shut down on healthcheck fail. For example if it ran out of ram. Now #Amazon, I hope you take this answer and staple it to your shitty ass ECS experience.
I swear the relentlessly, exclusively bottom-up UX of platforms like AWS and Azure are what is keeping the tutorial blogger industry alive.. How would I know what AWS CloudWatch is, or that it even exists? There are no hints about these things anywhere while you set up. You'd think the first thing that flashes on your screen after you completed a docker setup would be "hey, 99.9% of people right now need to set up logging. You should use cloudwatch. And here's how you connect healthchecks to cloudwatch". But no, of course not..! 🙃
Instead, AWS's "engineer" approach here seems to be: here's a grid of holes in the wall, and here's a mess of wires next to it in a bucket. Now in order to do the common frequently done tasks you want to do, you must first read the manual for each hole, and the manual for each wire in the bucket, then find all of the holes and wires you need, and plug them in the right order (and for the right order you need to find a blog post because that always involves some level of not following the docs and definitely also magic).
I guess it's called "job security" for if you're an enterprise server engineer :)
I faced the same issue, I found the AWS Wiki, the /dev/stdout symbolic link doesn't work to me, but /proc/1/fd/1 symbolic link works to me.
Here is the solution:
Step 1. Add those commands to your Dockerfile.
# forward logs to docker log collector
RUN ln -sf /proc/1/fd/1 /var/log/console.log \
&& ln -sf /proc/1/fd/2 /var/log/error.log
Step 2. refer to "Mark B"'s step2.
Step 1. Update your docker image by deleting all the log files you care about, and replacing them with symbolic links to stdout or stderr, for example to capture logs in an nginx container I may do the following in the Dockerfile:
RUN ln -sf /dev/stdout /var/log/nginx/access.log \
&& ln -sf /dev/stderr /var/log/nginx/error.log
Step 2. Configure the awslogs driver in the ECS Task Definition, like so:
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "my-log-group",
"awslogs-region": "my-aws-region",
"awslogs-stream-prefix": "my-log-prefix"
}
}
And as long as you gave the ECS Execution Role permission to write to AWS Logs, log data will start appearing in CloudWatch Logs.
I some questions about Docker. I have very little knowledge about it, so kindly bear with me.
I have a python script that does something and writes into a PostgreSQL DB. Both are run on Docker. Python uses python:3.8.8-buster and PostgreSQL postgres:13. Using docker-compose up, I am able to instantiate both these services and I see the items inserted in the PostgreSQL table. When I docker-compose down, as usual, the services shut down as expected. Here are the questions I have:
When I run the container of the PostgreSQL service by itself (not using docker-compose up, but docker run then docker exec) then login into db using PSQL, it doesn't take the db name as the db name mentioned in the docker-compose.yml file. It takes localhost, but with the username mentioned per the docker-compose.yml file. It also doesn't ask me for the password, although it's mentioned in the Dockerfile itself(not docker-compose.yml - for each of the services, I have a Dockerfile that I build in the docker-compose.yml). Is that expected? If so, why?
After I've logged in, when I SELECT * FROM DB_NAME; it displays 0 records. So, basically it doesn't display the records written in the DB in the previous run. Why's that? How can I see the contents of the DB when it's not up? When the container is running (when I docker-compose up), I know I can see the records from PG Admin (which BTW is also a part of my docker-compose.yml file, and I have it only to make it easier to see if the records have been written into the DB).
So after my script runs, and it writes into the db, it stops. Is there a way to restart with without docker-compose down then docker-compose up? (On VSCode) when I simply run the script, while still docker-compose is up it says it cannot find the db (that's mentioned in the docker-compose.yml file). So I have to go back and change the db name in the script to point localhost - This circles back to the question #1.
I am new to docker, and I am trying my best to wrap my head around all this.
This behavior depends on your specific setup. I will have to see the Dockerfile(s) and docker-compose.yaml in order to give a suitable answer.
This is probably caused by mounting an anonymous volume to your postgres service instead of a named volume. Anonymous volumes are not automatically mounted when executing docker-compose up. Named volumes are.
Here's a docker-compose.yaml example of how to mount a named volume called database:
version: '3.8'
# Defining the named volume
volumes:
database:
services:
database:
image: 'postgres:latest'
restart: 'always'
environment:
POSTGRES_USER: 'admin'
POSTGRES_PASSWORD: 'admin'
POSTGRES_DB: 'app'
volumes:
# Mounting the named volume
- 'database:/var/lib/postgresql/data/'
ports:
- '5432:5432'
I assume this depends more on the contents of your script than on the way you configured your docker postgres service. Postgres does not shut down after simply writing data to it. But again, I will have to see the Dockerfiles(s) and docker-compose.yaml (and the script) in order to provide a more suitable answer.
If you docker run an image, it always creates a new container, and it never looks at the docker-compose.yml. If you for example
docker run --name postgres postgres
docker exec -it postgres ...
that starts a new container based on the postgres:latest image, with no particular storage or networking setup. That's why you can't use the Compose host name of the container or see any of the data that your Compose setup would normally have.
You can use docker-compose up to start a specific service and its dependencies, though:
docker-compose up -d postgres
Once you do this, you can use ordinary tools like psql to connect to the database through its published ports:
psql -h localhost -p 5433 my_db
You should not need normally debugging tools like docker exec; if you do, there is a Compose variant that knows about the Compose service names
# For debugging use only -- not the primary way to interact with the DB
docker-compose exec postgres psql my_db
After my script runs, and it writes into the db, it stops. Is there a way to restart it?
Several options:
Make your script not stop, in whatever way. Frequently a Docker container will have something like an HTTP service that can accept requests and act on them.
Re-running docker-compose up -d (without explicitly down first) will restart anything that's stopped or anything whose Compose configuration has changed.
You can run a one-off script directly on the host, with configuration pointing at your database's published ports:.
It's relevant here that "in the Compose environment" and "directly on a developer system" are different environments, and you will need a mechanism like environment variables to communicate these. In the Compose environment you will need the database container's name and the default PostgreSQL port 5432; on a developer system you will typically need localhost as the host name and whichever port has been published. You cannot hard-code this configuration in your application.
There is a long list of dags and associated airflow variables at some remote instance of airflow, copy of which is running in my local system. All the variables from remote airflow instance are imported to my local airflow instance.
I have installed airflow image on top of docker and thereafter started the container. Everything works fine and I can access the airflow UI from my local system.
Problem:
Whenever I restart the airflow container, all the variables that were imported during the previous container run get invalid like this.
Work Around
Import the variables again to fix the variable related error.
However, Its really frustrating to import variables every time container starts. There must be an intelligent way of achieving this. Please help me understand what am I doing wrong.
New encryption key is generated when the docker container is restarted.
To ensure that the same encryption key is used you will have to either hardcode a FERNET_KEY in the config file or pass an env variable when the container is initially run.
docker run -it -p 8888:8080 -v D:\dev\Dags:/usr/local/airflow/dags -e FERNET_KEY=81HqDtbqAywKSOumSha3BhWNOdQ26slT6K0YaZeZyPs= --name my_airflow_dags airflow_image
The Fernet key here can be anything. Once this key is provided, docker can reuse the same every time container is restarted.
The Root Cause of this Problem is the AirFlow Encryption Mechanism for the Key-value Variables.
when you import your variables manually, their is_encrypted attributes, are automatically set to True.
Whenever, you restart the container, new Encryption Key is generated, thus the old ones got Invalid.
You Have 3 Options :
Set the fernet_key explicitly in airflow.cfg
Set the AIRFLOW__CORE__FERNET_KEY Environment Variable in docker-compose.yml
Set the is_encrypted attributes to False(Admin UI, CLI, update sql query , ...)
I personally chose the second one, so my docker-compose.yml file, looks like this :
environment:
- LOAD_EX=n
- EXECUTOR=Local
- AIRFLOW__CORE__FERNET_KEY='81HqDtbqAywKSOumSha3BhWNOdQ26slT6K0YaZeZyPs='
thanks to wittfabian
I am connecting to a remote server through
ssh user#server.com
and run
python script.py
in the appropriate directory. However, I get the error
ImportError: No module named numpy
even though I know the module is installed and the script runs with no problems when I am physically logged in to that server.
None of the answers I was able to find worked (for example this, and this). Do have any ideas as to how I can run the script using ssh?
The remote server has Python 2.6.6 installed, and
which python
returns
/usr/bin/python
The remote serves runs CentOS.
See similar problem describe here: Why does an SSH remote command get fewer environment variables then when run manually?.
Compare your environment variables in the local (physical) mode to the remote mode by running env in both cases. Move missing variables from your local profile to /etc/profile. Then log out from ssh session and connect again.
Another approach: If you don't want to change anything, then after ssh switch to your user via su - <your user>. This may look weird because you already logged it with this user. The difference is, that after su all your env. variables will set like in a local (physical) mode. Advantage: it is quick. Disadvantage: You will have to do it each time you want to run your Python script. So the first approach with configuring /etc/profile may be better on the long run.
I am having trouble accessing docker daemon from a client using docker-py in Python. I started a docker daemon by the command
sudo docker -d & and the output was [1] 4894. Then I tried to access the daemon from python using the code that I got from here as root
from docker import Client
cli = Client(base_url='unix://var/run/docker.sock')
cli.containers()
This gave me the error:
requests.exceptions.ConnectionError: ('Connection aborted.', error(111, 'Connection refused'))
I also tried
cli = Client(base_url='tcp://127.0.0.1:4894')
but it gave me the same error.
This seems that the /var/run/docker.sock file has the incorrect permissions. As the docker daemon is started as root the permissions are probably to restrictive.
If you change the permissions to allow other users to access it you should have more success (e.g. o=rwx).
The issue is indeed that /var/run/docker.sock has the incorrect permissions.
To fix it, you need to give the current user access to this file.
However, on Linux, giving o=rwx rights to /var/run/docker.sock is very dangerous as it allows any user and service on the system to run commands as root. Indeed access to /var/run/docker.sock implies full root access to the machine. See https://docs.docker.com/engine/security/#docker-daemon-attack-surface
A less dangerous approach consists in creating the group docker and adding the current user to this group. See https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user
However, this approach is still potentially dangerous as it gives the current user full root access without the protections that sudo offers (i.e., asking the user password from time to time and logging sudo calls.
See also What is the Docker security risk of /var/run/docker.sock?
(I unfortunately cannot comment hence I write my comment as an answer.)