Docker, Supervisord and logging - how to consolidate logs in docker logs?

Docker, Supervisord and logging - how to consolidate logs in docker logs? - python

So, experimenting with Docker + Supervisord + Django app via uWSGI. I have the whole stack working fine, but need to tidy up the logging.
If I launch supervisor in non-daemon mode,
/usr/bin/supervisord -n
Then I get the logging output for supervisor played into the docker logs stdout. However, if supervisord is in daemon mode, its own logs get stashed away in the container filesystem, and the logs of its applications do too - in their own app__stderr/stdout files.
What I want is to log both supervisor, and application stdout to the docker log.
Is starting supervisord in non-daemon mode a sensible idea for this, or does it cause unintended consequences? Also, how do I get the application logs also played into the docker logs?

I accomplished this using .
Install supervisor-stdout in your Docker image:
RUN apt-get install -y python-pip && pip install supervisor-stdout
Supervisord Configuration
Edit your supervisord.conf look like so:
[program:myprogram]
command=/what/ever/command
stdout_events_enabled=true
stderr_events_enabled=true
[eventlistener:stdout]
command = supervisor_stdout
buffer_size = 100
events = PROCESS_LOG
result_handler = supervisor_stdout:event_handler

Docker container is like a kleenex, you use it then you drop it. To be "alive", Docker needs something running in foreground (whereas daemons run in background), that's why you are using Supervisord.
So you need to "redirect/add/merge" process output (access and error) to Supervisord output you see when running your container.
As Drew said, everyone is using https://github.com/coderanger/supervisor-stdout to achieve it (to me this should be added to supervisord project!). Something Drew forgot to say, you may need to add
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
To the supervisord program configuration block.
Something very usefull also, imagine your process is logging in a log file instead of stdout, you can ask supervisord to watch it:
[program:php-fpm-log]
command=tail -f /var/log/php5-fpm.log
stdout_events_enabled=true
stderr_events_enabled=true
This will redirect php5-fpm.log content to stdout then to supervisord stdout via supervisord-stdout.

supervisor-stdout requires to install python-pip, which downloads ~150mb, for a container I think is a lot just for install another tool.
Redirecting logfile to /dev/stdout works for me:
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
http://veithen.github.io/2015/01/08/supervisord-redirecting-stdout.html

I agree, not using the daemon mode sounds like the best solution, but I would probably employ the same strategy you would use when you had actual physical servers or some kind of VM setup: centralize logging.
You could use something self-hosted like logstash inside the container to collect logs and send it to a central server. Or use a commercial service like loggly or papertrail to do the same.

Today's best practice is to have minimal Docker images. For me, ideal container with Python application contain just my code, supporting libraries and something like uwsgi if it is necessary.
I published one solution on https://github.com/msgre/uwsgi_logging. It is simple Django application behind uwsgi which is configured to display logs from uwsgi and Django app on containers stdout without need of supervisor.

I had the same problem with my python app (Flask). Solution that worked for me was to:
Start supervisord in nodaemon mode (supervisord -n)
Redirect log to /proc/1/fd/1 instead of /dev/stdout
Set these two environment variables in my docker image PYTHONUNBUFFERED=True and PYTHONIOENCODING=UTF-8
Just add below line to your respective supervisor.ini config file.
redirect_stderr=true
stdout_logfile=/proc/1/fd/1
Export these variables to application (linux) environment.
$ export PYTHONUNBUFFERED=True
$ export PYTHONIOENCODING=UTF-8

Indeed, starting supervisord in non-daemon mode is the best solution.
You could also use volumes in order to mount the supervisord's logs to a central place.

Related

How do you debug python code with kubernetes and skaffold?

I am currently running a django app under python3 through kubernetes by going through skaffold dev. I have hot reload working with the Python source code. Is it currently possible to do interactive debugging with python on kubernetes?
For example,
def index(request):
import pdb; pdb.set_trace()
return render(request, 'index.html', {})
Usually, outside a container, hitting the endpoint will drop me in the (pdb) shell.
In the current setup, I have set stdin and tty to true in the Deployment file. The code does stop at the breakpoint but it doesn't give me access to the (pdb) shell.

There is a kubectl command that allows you to attach to a running container in a pod:
kubectl attach <pod-name> -c <container-name> [-n namespace] -i -t
-i (default:false) Pass stdin to the container
-t (default:false) Stdin is a TTY
It should allow you to interact with the debugger in the container.
Probably you may need to adjust your pod to use a debugger, so the following article might be helpful:
How to use PDB inside a docker container.
There is also telepresence tool that helps you to use different approach of application debugging:
Using telepresence allows you to use custom tools, such as a debugger and IDE, for a local service and provides the service full access to ConfigMap, secrets, and the services running on the remote cluster.
Use the --swap-deployment option to swap an existing deployment with the Telepresence proxy. Swapping allows you to run a service locally and connect to the remote Kubernetes cluster. The services in the remote cluster can now access the locally running instance.

It might be worth looking into Rookout which allows in-prod live debugging of Python on Kubernetes pods without restarts or redeploys. You lose path-forcing etc but you gain loads of flexibility for effectively simulating breakpoint-type stack traces on the fly.

This doesn't use Skaffold, but you can attach the VSCode debugger to any running Python pod with an open source project I wrote.
There is some setup involved to install it on your cluster, but after installation you can debug any pod with one command:
robusta playbooks trigger python_debugger name=myapp namespace=default

You can take a look at okteto/okteto. There's a good tutorial which explains how you can develop and debug directly on Kubernetes.

How do I get a server up and running in a virtual machine through only command line?

I wrote a server application in Python with Flask and now I would like to get it up and running on a virtual machine I have set up. Thus, I would really appreciate guidance in two areas.
How do I get a server setup so that it is perpetually running, and other computers can access it? The computers can be in the same network so I don't have to worry about a domain name or anything. I am just looking for multiple devices to be able to access it. I am currently able to run the server on my local machine and everything works just fine.
I have my virtual linux machine set up remotely, so I SSH into it and do everything from command line, but I am a bit lost as to how to do the aforementioned stuff from the command line.
Any guidance/help is much appreciated! The web-searching I have done hasn't pointed me in the right direction. I apologize if any of my terminology was off (if so, please feel free to correct me so I learn!). Thank you!

Use systemd on Ubuntu, /etc/systemd/system, for a simple setup (probably not ideal for a production setup though).
I do this sometimes for Python Flask app that I'm prototyping. First, put your application code in /opt/my-app. I usually just cd /opt and git clone a repo there. Then, create a file called /etc/systemd/system/my-app.service. In that file, add the following:
[Unit]
Description=My App daemon
After=network.target postgresql.service
Wants=postgresql.service
[Service]
EnvironmentFile=/etc/sysconfig/my-app
WorkingDirectory=/opt/my-app/ # <- this is where your app lives
User=root
Group=root
Type=simple
ExecStart=/usr/bin/python server.py # <- this starts your app
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target
Next, paste any environment variables you have into a file called /etc/sysconfig/my-app like:
DB_HOST=localhost
DB_USER=postgres
DB_PASSWORD=postgres
DB_NAME=postgres
Then you can do:
service my-app start
service my-app stop
service my-app restart
and then you can hit the app running on the servers IP and port (just like if you ran python app.py or python server.py. To check the logs for your daemon process, if it doesn't seem to work out, you can run:
journalctl -u my-app -e
In production, I'm not sure this is the best setup, probably better to look into something like ngnix. But I do this for prototypes all the time and it's pretty great.

How can I start redis queue worker without command line interface?

I use this tool http://python-rq.org/
I have a flask app, but I could not find the way to start rq worker other than with rq cli, e.g. $ rq worker
I need to have the worker running all the time, how can I make it running as a service? I need the service to start up on boot also.

You should investigate some supervisor program to control your rq worker. Take a look at supervisor or systemd. I personally use supervisord and it's pretty popular in the Python community.
This is how any supervisor program works (not to be confused with supervisord): the supervisor itself is a service (controlled by another service! e.g., systemd, initd, etc) and it runs programs specified in its configuration file. If a program exits or has issues, the supervisor will respawn it.
If you were in the docker ecosystem, it'd be simpler because docker can be your supervisor.

Multiple options :
For dev purpose, I recommand a docker container (https://docker.com). For production purpose, the cleanest way is : use the packaged version for your system (assuming you're using a GNU/Linux box) along with a dedicated systemd unit.
For example, on Debian :
apt-get update && apt-get install redis
Then edit redis.conf, and start it :
systemctl start redis
Enable it (== start redis during startup)
systemctl enable redis

mod_wsgi: Reload Code via Inotify - not every N seconds

Up to now I followed this advice to reload the code:
https://code.google.com/archive/p/modwsgi/wikis/ReloadingSourceCode.wiki
This has the drawback, that the code changes get detected only every N second. I could use N=0.1, but this results in useless disk IO.
AFAIK the inotify callback of the linux kernel is available via python.
Is there a faster way to detect code changes and restart the wsgi handler?
We use daemon mode on linux.
Why code reload for mod_wsgi at all
There is interest in why I want this at all. Here is my setup:
Most people use "manage.py runserver" for development and some other wsgi deployment for for production.
In my context we have automated the creation of new systems and prod and development systems are mostly identical.
One operating system (linux) can host N systems (virtual environments).
Developers can use runserver or mod_wsgi. Using runserver has the benefit that it's easy for debugging, mod_wsgi has the benefit that you don't need to start the server first.
mod_wsgi has the benefit, that you know the URL: https://dev-server/system-name/myurl/
With runserver you don't know the port. Use case: You want to link from an internal wiki to a dev-system ....
A dirty hack to get code reload for mod_wsgi, which we used in the past: maximum-requests=1 but this is slow.

Preliminaries.
Developers can use runserver or mod_wsgi. Using runserver has the
benefit that you it easy for debugging, mod_wsgi has the benefit that
you don't need to start the server first.
But you do, the server needs to be setup first and that takes a lot of effort. And the server needs to be started here as well though you can configure it to start automatically at boot.
If you are running on port 80 or 443 which is usually the case, the server can be started only by the root. If it needs to be restarted you will have to ask the super user's help again. So ./manage.py runserver scores heavily here.
mod_wsgi has the benefit, that you know the URL:
https://dev-server/system-name/myurl/
Which is no different from the dev server. By default it starts on port 8000 so you can access it as http://dev-server:8000/system-name/myurl/. If you wanted to use SSL with the development server you can use a package such as django-sslserver or you can put nginx in front of django development server.
With runserver you don't know the port. Use case: You want to link from >an internal wiki to a dev-system ....
With runserver, the port is well defined as mentioned above. And you can make it listen on a different port for exapmle with:
./manage.py runserver 0.0.0.0:9090
Note that if you put development server behind apache (as a reverse proxy) or NGINX, restarting problems etc that I have mentioned above do not apply here.
So in short, for development work, what ever you do with mod_wsgi can be done with the django development server (aka ./manage.py runserver).
Inotify
Here we are getting to the main topic at last. Assuming you have installed inotify-tools you could type this into your shell. You don't need to write a script.
while inotifywait -r -e modify .; do sudo kill -2 yourpid ; done
This will result in the code being reloaded when ...
... using daemon mode with a single process you can send a SIGINT
signal to the daemon process using the ‘kill’ command, or have the
application send the signal to itself when a specific URL is
triggered.
ref: http://modwsgi.readthedocs.io/en/develop/user-guides/frequently-asked-questions.html#application-reloading
alternatively
while inotifywait -r -e modify .; do touch wsgi.py ; done
when
... using daemon mode, with any number of processes, and the process
reload mechanism of mod_wsgi 2.0 has been enabled, then all you need
to do is touch the WSGI script file, thereby updating its modification
time, and the daemon processes will automatically shutdown and restart
the next time they receive a request.
In both situations we are using the -r flag to tell inotify to monitor subdirectories. That means each time you save a .css or .js file apache will reload. But without the -r flag changes to python code in subfolders will be undetected. To have the best of both worls, remove css, js, images etc with the --exclude directive.
What about when your IDE saves an auto backup file? or vim saves the .swp file? That too will cause a code reload. So you would have to exclude those file types too.
So in short, it's a lot of hard work to reproduce what the django development server does free of charge.

You can use inotify hooktables to run any command you want depending on a i-notify signal (here's my source link: http://terokarvinen.com/2016/when-files-change-take-action-inotify-hookable).
After looking the tables you can just reload the code of apache.
For your specific problem, it should be something like:
inotify-hookable --watch-directories sources/ --recursive --on-modify-command './code_reload.sh'
In the previous link, the command to execute was just a simple touch flask/init.wsgi
So, the whole code (adding ignored files was):
inotify-hookable --watch-directories flask/ --recursive --ignore-paths='flask/init.wsgi' --on-modify-command 'touch flask/init.wsgi'
As stated here: Flask + mod_wsgi automatic reload on source code change, if you have enabled WSGIScriptReloading, you can just touch that file. It will cause the entire code to reload (not just the config file). But, if you prefer, you can set any other script to reload the code.
After googling a bit, it seems to be a pretty standard solution for that problem and I think that you can use it for your application.

How do you manage development tasks when setting up your development environment using docker containers?

I have been researching docker and understand almost everything I have read so far. I have built a few images, linked containers together, mounted volumes, and even got a sample django app running.
The one thing I can not wrap my head around is setting up a development environment. The whole point of docker is to be able to take your environment anywhere so that everything you do is portable and consistent. If I am running a django app in production being served by gunicorn for example, I need to restart the server in order for my code changes to take affect; this is not ideal when you are working on your project in your local laptop environment. If I make a change to my models or views I don't want to have to attach to the container, stop gunicorn, and then restart it every time I make a code change.
I am also not sure how I would run management commands. python manage.py syncdb would require me to get inside of the container and run commands. I also use south to manage data and schema migrations python manage.py migrate. How are others dealing with this issue?
Debugging is another issue. Would I have to somehow get all my logs to save somewhere so I can look at things? I usually just look at the django development server's output to see errors and prints.
It seems that I would have to make a special dev-environment container that had a bunch of workarounds; that seems like it completely defeats the purpose of the tool in the first place though. Any suggestions?
Update after doing more research:
Thanks for the responses. They set me on the right path.
I ended up discovering fig http://www.fig.sh/ It let's you orchestrate the linking and mounting of volumes, you can run commands. fig run container_name python manage.py syncdb . It seems pretty nice and I have been able to set up my dev environment using it.
Made a diagram of how I set it up using vagrant https://www.vagrantup.com/.
I just run
fig up
in the same directory as my fig.yml file and it does everything needed to link the containers and start the server. I am just running the development server when working on my mac so that it restarts when I change python code.

At my current gig we setup a bash script called django_admin. You run it like so:
django_admin <management command>
Example:
django_admin syncdb
The script looks something like this:
docker run -it --rm \
-e PYTHONPATH=/var/local \
-e DJANGO_ENVIRON=LOCAL \
-e LC_ALL=en_US.UTF-8 \
-e LANG=en_US.UTF-8 \
-v /src/www/run:/var/log \
-v /src/www:/var/local \
--link mysql:db \
localhost:5000/www:dev /var/local/config/local/django-admin $#
I'm guessing you could also hook something up like this to manage.py

I normally wrap my actual CMD in a script that launches a bash shell. Take a look at Docker-Jetty container as an example. The final two lines in the script are:
/opt/jetty/bin/jetty.sh restart
bash
This will start jetty and then open a shell.
Now I can use the following command to enter a shell inside the container and run any commands or look at logs. Once I am done I can use Ctrl-p + Ctrl-q to detach from the container.
docker attach CONTAINER_NAME

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.