Python script to add data to postgres docker container runs multiple times - python

I'm trying to find a good way to populate a database with initial data for a simple application. I'm using a tutorial from realpython.com as a starting point. I then run a simple python script after the database is created to add a single entry, but when I do this the data is added multiple times even though I only call the script once. result
population script (test.py):
from app import db
from models import *
t = Post("Hello 3")
db.session.add(t)
db.session.commit()
edit:
Here is the docker-compose file which i use to build the project:
web:
restart: always
build: ./web
expose:
- "8000"
links:
- postgres:postgres
volumes:
- /usr/src/app/static
env_file: .env
command: /usr/local/bin/gunicorn -w 2 -b :8000 app:app
nginx:
restart: always
build: ./nginx/
ports:
- "80:80"
volumes:
- /www/static
volumes_from:
- web
links:
- web:web
data:
restart: always
image: postgres:latest
volumes:
- /var/lib/postgresql
command: "true"
postgres:
restart: always
image: postgres:latest
volumes_from:
- data
ports:
- "5432:5432"
it references two different Dockerfiles:
Dockerfile #1 which builds the App container and is 1 line:
FROM python:3.4-onbuild
Dockerfile #2 is used to build the nginx container
FROM tutum/nginx
RUN rm /etc/nginx/sites-enabled/default
ADD sites-enabled/ /etc/nginx/sites-enabled
edit2:
Some people have suggested that the data was persisting over several runs, and that was my initial thought as well. This is not the case, as I remove all active docker containers via docker rm before testing. Also the number of "extra" data is not consistent, ranging randomly from 3-6 in the few tests that I have run so far.

It turns out this is a bug related to using the run command on containers with the "restart: always" instruction in the docker-compose/Dockerfile. In order to resolve this issue without a bug fix I removed the "restart: always" from the web container.
related issue: https://github.com/docker/compose/issues/1013

Related

Celery Tasks are not getting added to database

I am trying to run my django application using docker which involves celery. I am able to set everything on local and it works perfectly fine. However, when I run it docker, and my task gets executed, it throws me the following error:
myapp.models.mymodel.DoesNotExist: mymodel matching query does not exist.
I am particularly new to celery and docker so not sure what am I doing wrong.
Celery is set up correctly, I have made sure of that. Following are the broker_url and backend:
CELERY_BROKER_URL = 'redis://redis:6379/0'
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_BACKEND = 'django-db'
This is my docker-compose.yml file:
version: "3.8"
services:
redis:
image: redis:alpine
container_name: rz01
ports:
- "6379:6379"
networks:
- npm-nw
- braythonweb-network
braythonweb:
build: .
command: >
sh -c "python manage.py makemigrations &&
python manage.py migrate &&
gunicorn braython.wsgi:application -b 0.0.0.0:8000 --workers=1 --timeout 10000"
volumes:
- .:/code
ports:
- "8000:8000"
restart: unless-stopped
env_file: .env
networks:
- npm-nw
- braythonweb-network
celery:
build: .
restart: always
container_name: cl01
command: celery -A braython worker -l info
depends_on:
- redis
networks:
- npm-nw
- braythonweb-network
networks:
braythonweb-network:
npm-nw:
external: false
I have tried few things from different stackoverflow posts like apply_async. I have also made sure that my model existed.
Update On further investigating the issue, I have noticed that the celery task does not get created in the database in the first place. Don't know why, may be I have to the following with something else:
CELERY_RESULT_BACKEND = 'django-db'
The exception is telling you that you are looking for an entry in your database, that does not exist (yet). Look for any function where you query the database and make sure you create the needed entry before looking for it. I'm assuming you have a table in your database for some configuration, that is read in a function, but the database is empty at the beginning.
I had to add the following to the celery container too to provide access to it:
volumes:
- .:/code

Best way to run multiple flask apps in a single docker container

On a T2-Micro Instance on AWS/EC2 -
I have built four Docker containers as show in the .yaml file below.
These are:
Nginx
economy (app1)
elections (app2)
social (app3)
There are gunicorn web servers in each of the three app containers serving 1 flask app. These are Plot.ly/Dash apps.
As one might see, this takes a container for each app which gets bulky after three and starts to consume too much memory on the T2-Micro Instance.
What would be ideal is if each app container ie: economy, elections, social etc. could have multiple flask apps within, using port iteration such as 5000, 5001, 5002 etc. They would all be addressable by unique port numbers which could be enumerated in the .yaml file.
The use of single containers, single stacks of gunicorn, flask and dependent packages would reduce the memory requirements for individual containers allowing me to load up more apps on a single ec2 instance.
The .yaml file below:
version: '2.1'
services:
economy:
container_name: economy
hostname: economy
restart: always
build: economy
networks:
tsworker-net:
expose:
- "8000"
volumes:
- ./data:/tmp/data:ro
command: gunicorn -w 1 -b :8000 economy:server
elections:
container_name: elections
hostname: elections
restart: always
build: elections
networks:
tsworker-net:
expose:
- "8500"
volumes:
- ./data:/tmp/data:ro
- ./assets:/tmp/assets:ro
environment:
- FLASK_ENV=development
command: gunicorn --log-level debug -w 1 -b :8500 elections:server
social:
container_name: social
hostname: social
restart: always
build: social
networks:
tsworker-net:
expose:
- "9000"
volumes:
- ./data:/tmp/data:ro
command: gunicorn -w 1 -b :9000 social:server # was 8000
nginx:
image: nginx:1.15
container_name: nginx
hostname: nginx
restart: unless-stopped
networks:
tsworker-net:
ports:
- 80:80
- 443:443
volumes:
- ./nginx/nginx.http.conf:/etc/nginx/conf.d/default.conf:ro
- /etc/letsencrypt/etc:/etc/letsencrypt
- /etc/letsencrypt/www:/var/www/letsencrypt
environment:
- TZ=UTC
depends_on:
- economy
- elections
- social
networks:
tsworker-net:
driver: bridge
Any help with this will be highly appreciated.
Agree that this breaks the docker principal, but i've used supervisord to run multiple services in a single container in the past with some success. It was a pain to troubleshoot when things went wrong so I ended up using several containers at the end of the project.
Documentation here https://docs.docker.com/config/containers/multi-service_container/
Docker principle is one service per container, so its not bad thinking having multiple containers for multiple instances. If you want to reduce resource usage, try using an alpine image in your Dockerfiles. Anyway, afaik container self memory usage if very low if not none, the main source of usage is the app.
What you are describing sounds like to scale the services manually, instead of using "docker-compose up --scale" https://docs.docker.com/compose/reference/up/
You could change the command by a supervisord that runs multiple times the gunicorn and expose the ports manually in docker-compose file... But thats a bit rare in docker "way of do things".
You may try to add "scale: 3" to one service and see if works well to you. Just note that using scale is not compatible with container_name, because it will scale the name too.
Hope it helps!

What is the proper way to setup a simple docker-compose configuration for testing?

My current docker-compose.yml file:
version: '2'
services:
app:
restart: always
build: ./web
ports:
- "8000:8000"
volumes:
- ./web:/app/web
command: /usr/local/bin/gunicorn -w 3 -b :8000 project:create_app()
environment:
FLASK_APP: project/__init__.py
depends_on:
- db
working_dir: /app/web
db:
image: postgres:9.6-alpine
restart: always
volumes:
- dbvolume:/var/lib/postgresql/data
environment:
POSTGRES_DB: app
POSTGRES_USER: app
POSTGRES_PASSWORD: app
volumes:
dbvolume:
I'm now trying to create a docker-compose-test.yml file that overrides the previous file for testing. What came to my mind was to use this:
version: '2'
services:
app:
command: pytest
db:
volumes:
- dbtestvolume:/var/lib/postgresql/data
volumes:
dbtestvolume:
And then run the tests with the command:
docker-compose -f docker-compose.yml -f docker-compose-test.yml run --rm app
that as far as I understand should override only the different aspects compared to the docker-file used for development, that is the command used and the data volume where the data is stored.
The command is successfully overridden, while unfortunately the data volume stays the same and so the data of my application get overwritten if I run my tests.
Is this the correct way to set up a docker configuration for the tests? Any suggestion about what is going wrong?
If this is not the correct way, what is the proper way to setup a docker-compose configuration for testing?
Alternative test
I tried to change my docker-compose-test.yml file to use a different service (db-test) for testing:
version: '2'
services:
app:
command: pytest
depends_on:
- db-test
db-test:
image: postgres:9.6-alpine
restart: always
environment:
POSTGRES_DB: app
POSTGRES_USER: app
POSTGRES_PASSWORD: app
What happens now is that I have data is not overwritten (so, in a way, it works, hurray!) when a run my tests, but if I try to run the command:
docker-compose down
I get this ouput:
Stopping app_app_1 ... done
Stopping app_db_1 ... done
Found orphan containers (app_db-test_1) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
and then the docker-compose down fails. So something is not configured properly.
Any idea?
If you don't want to persist the DB data, don't use volumes, so you will have a fresh database everytime you start the container.
I guess you need some prepopulated data in your tables, so just build a new DB image copying the data you need. The Docker file could be something like:
FROM postgres:9.6-alpine
COPY db-data/ /var/lib/postgresql/data
In case you need to update the data, mount the db-data/ using -v, change it and rebuild the image.
BTW, it would be better to use an automated pipeline to test your builds, using Jenkins, GitLab CI, Travis or whatever solution that suits you. Anyway, you can use docker-compose in your pipeline as well to keep it consistent with your local development environment.

Slow django model instance creation with Docker

I have django application with some model. I have manage.py command that creates n models and saves it to db. It runs with decent speed on my host machine.
But if I run it in docker it runs very slow, 1 instance created and saved in 40-50 seconds. I think I am missing something on how Docker works, can somebody point out why performance is low and what can i do with it?
docker-compose.yml:
version: '2'
services:
db:
restart: always
image: "postgres:9.6"
ports:
- "5432:5432"
volumes:
- /usr/local/var/postgres:/var/lib/postgresql
environment:
- POSTGRES_PASSWORD=postgres
- POSTGRES_DB=my_db
- POSTGRES_USER=postgres
web:
build: .
command: bash -c "./wait-for-it.sh db:5432 --timeout=15; python manage.py migrate; python manage.py runserver 0.0.0.0:8000; python manage.py mock 5"
ports:
- "8000:8000"
expose:
- "8000"
depends_on:
- db
dockerfile for web service:
FROM python:3.6
ENV PYTHONBUFFERED 1
ADD . .
WORKDIR .
RUN pip install -r requirements.txt
RUN chmod +x wait-for-it.sh
The problem here is most likely the volume /usr/local/var/postgres:/var/lib/postgresql as you are using it on Mac. As I understand the Docker for Mac solution, it uses file sharing to implement host volumes, which is a lot slower then native filesystem access.
A possible workaround is to use a docker volume instead of a host volume. Here is an example:
version: '2'
volumes:
postgres_data:
services:
db:
restart: always
image: "postgres:9.6"
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql
environment:
- POSTGRES_PASSWORD=postgres
- POSTGRES_DB=my_db
- POSTGRES_USER=postgres
web:
build: .
command: bash -c "./wait-for-it.sh db:5432 --timeout=15; python manage.py migrate; python manage.py runserver 0.0.0.0:8000; python manage.py mock 5"
ports:
- "8000:8000"
expose:
- "8000"
depends_on:
- db
Please note that this may complicate management of the postgres data, as you can't simply access the data from your Mac. You can only use the docker CLI or containers to access, modify and backup this data. Also, I'm not sure what happens if you uninstall Docker from your Mac, it may be that you lose this data.
Two things, can be a probable cause:
Starting of docker container takes some time, so if you start new container for each instance this can add up.
What storage driver do you use? Docker (often) defaults to device mapper loopback storage driver, which is slow. Here is some context. This will be painfull especially if you start this container often.
Other than that your config looks sensibly, and there are no obvious causes problems there. So if the above two points don't apply to you, please add some extra comments --- like how you actually add these model instances.

Multiple Python Scripts in Docker

This is quite a basic question but I haven't been able to get an answer from researching on Google, although I think its more due to my lack of understanding, than the answer not being out there.
I am getting to grips with Docker and have a python Flask Admin script and a postgres db both in two separate containers but under on docker-compose file. I would like another python script to run at the same time which will be scraping a website. I have the file all set up but how do I include it in the same Docker-Compose or DockerFile?
version: '2'
services:
db:
image: postgres
environment:
- PG_PASSWORD=XXXXX
dev:
build: .
volumes:
- ./app:/code/app
- ./run.sh:/code/run.sh
ports:
- "5000:5000"
depends_on:
- db
Exactly what to write depends on your directory configuration, but you basically want
version: '2'
services:
db:
image: postgres
environment:
- PG_PASSWORD=XXXXX
dev:
build: <path-to-dev>
volumes:
- ./app:/code/app
- ./run.sh:/code/run.sh
ports:
- "5000:5000"
depends_on:
- db
scraper:
build: <path-to-scraper>
depends_on:
- db
The two paths might be the same. You might push an image and then reference that instead of building it on the fly. You might do the same business of just mounting the code directory instead of building it into the image (but don't do that for actual deployment).

Categories