Python Selenium and Docker

Python Selenium and Docker - python

I'm trying to create multiple containers with RPAs using selenium and Python, how can I do this without installing python and its libraries in each container? Like a base container with all dependencies and I can export these dependencies to the other containers. Or it cannot be done?
services:
chromedriver:
container_name: chromedriver
image: selenium/standalone-chrome:latest
shm_size: 2gb
ports:
- 4444:4444
- 5900:5900
restart: always
bank_1:
build:
dockerfile: Dockerfile-bank1
container_name: bank_1
command: python3 bank_1.py
ports:
- 8000:8000
depends_on:
- chromedriver
bank_2:
build:
dockerfile: Dockerfile-bank1
container_name: bank_2
command: python3 bank_2.py
ports:
- 8001:8001
depends_on:
- chromedriver

The usual way of doing this is to create an image and host it on DockerHub/ECR. When you change the code, you re-build the image and push a new version, meaning that the dependencies will be re-fetched once. And then your docker-compose services will reference this remote image as many times as needed.
To automate re-building the image, you can use tools like CircleCI or GitHub Actions.
(If you are only ever running this locally, then you may be able to skip the CI and DockerHub pieces and just build the image on your computer.)
Note also that you would typically not duplicate the service itself in the compose file, but rather use docker service scale or a reverse proxy like traefik to manage multiple identical instances.

Related

Docker - Build a service after the dependant service is up and running

I have a docker-compose file for a Django application.
Below is the structure of my docker-compose.yml
version: '3.8'
volumes:
pypi-server:
services:
backend:
command: "bash ./install-ppr_an_run_dphi.sh"
build:
context: ./backend
dockerfile: ./Dockerfile
volumes:
- ./backend:/usr/src/app
expose:
- 8000:8000
depends_on:
- db
pypi-server:
image: pypiserver/pypiserver:latest
ports:
- 8080:8080
volumes:
- type: volume
source: pypi-server
target: /data/packages
command: -P . -a . /data/packages
restart: always
db:
image: mysql:8
ports:
- 3306:3306
volumes:
- ~/apps/mysql:/var/lib/mysql
environment:
- MYSQL_ROOT_PASSWORD=gary
- MYSQL_PASSWORD=tempgary
- MYSQL_USER=gary_user
- MYSQL_DATABASE=gary_db
nginx:
build: ./nginx
ports:
- 80:80
depends_on:
- backend
Django app is dependent on a couple of private packages hosted on the private-pypi-server without which the app won't run.
I created a separate dockerfile for django-backend alone which install packages of requirements.txt and the packages from private-pypi-server. But the dockerfile of django-backend service is running even before the private pypi server is running.
If I move the installation of private packages to docker-compose.yml command code under django-backend service in , then it works fine. Here the issue is that, if the backend is running and I want to run some commands in django-backend(./manage.py migrat) then it says that the private packages are not installed.
Im not sure how to proceed with this, it would be really helpful If i can get all these services running at once by just running the command docker-compose up --build -d

Created a separate docker-compose for pypi-server, which will be up and running even before I build/start other services.

Have you tried adding the pipy service to depends_on of the backend app?
backend:
command: "bash ./install-ppr_an_run_dphi.sh"
build:
context: ./backend
dockerfile: ./Dockerfile
volumes:
- ./backend:/usr/src/app
expose:
- 8000:8000
depends_on:
- db
- pypi-server
Your docker-compose file begs a few questions though.
Why to install custom packages to the backend service at a run time? I can see so many problems which might arise from this such as latency during service restarts, possibly different environments between runs of the same version of the backend service, any problems with the installation would come up during the deployment bring it down, etc. Installation should be done during the build of the docker image. Could you provide your Dockerfile maybe?
Is there any reason why the pypi server has to share docker-compose with the application? I'd suggest having it in a separate deployment especially if it is to be shared among other projects.
Is the pypi server supposed to be used for anything else than a source of the custom packages for the backend service? If not then I'd consider getting rid of it / using it for the builds only.
Is there any good reason why you want to have all the ports exposed? This creates a significant attack surface. E.g. an attacker could bypass the reverse proxy and talk directly to the backend service using port 8000 or they'd be able to connect to the db on the port 3306. Nb docker-compose creates subnetworks among the containers so they can access each other's ports even if those ports are not forwarded to the host machine.
Consider using docker secrets to store db credentials.

Restart Python script inside Docker container when any file has changes

I have a simple script like
print('hey 01')
and I have dockerized it like below:
FROM python:3.8
WORKDIR /crawler_app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "-u", "boot_up.py"]
and also I have a compose file like below:
version: "3.7"
services:
db:
image: mongo:latest
container_name: ${DB_CONTAINER_NAME}
volumes:
- ./mongo-volume:/data/db
restart: always
environment:
MONGO_INITDB_ROOT_USERNAME: ${MONGO_USER}
MONGO_INITDB_ROOT_PASSWORD: ${MONGO_USER_PASS}
ports:
- ${EXPOSED_PORT}:27017
networks:
- crawl-network
crawler:
build: crawler
container_name: ${CRAWLER_APP_NAME}
restart: always
volumes:
- ./crawler:/crawler_app
networks:
- crawl-network
depends_on:
- db
networks:
crawl-network:
The Problem
The problem is that although I have a volume to inside my Docker container, and when I change code in my editor and save it, the source code inside container would update but there is no way to restart the Python script to start with new updated code.
I have searched a lot about this issue and I found some threads on GitHub and Stack Overflow but none of them was not useful to me and I got no answer from them.
My main question is, how can I restart a Python script inside container when I change the source code and save it?
I found a way that mentions to restart your container every time, but I think that there should be a simple way like something like nodemon to JavaScript.

If you want to use watchmedo you need to install library for parsing yaml files by command
python -m pip install pyyaml

What is the proper way to setup a simple docker-compose configuration for testing?

My current docker-compose.yml file:
version: '2'
services:
app:
restart: always
build: ./web
ports:
- "8000:8000"
volumes:
- ./web:/app/web
command: /usr/local/bin/gunicorn -w 3 -b :8000 project:create_app()
environment:
FLASK_APP: project/__init__.py
depends_on:
- db
working_dir: /app/web
db:
image: postgres:9.6-alpine
restart: always
volumes:
- dbvolume:/var/lib/postgresql/data
environment:
POSTGRES_DB: app
POSTGRES_USER: app
POSTGRES_PASSWORD: app
volumes:
dbvolume:
I'm now trying to create a docker-compose-test.yml file that overrides the previous file for testing. What came to my mind was to use this:
version: '2'
services:
app:
command: pytest
db:
volumes:
- dbtestvolume:/var/lib/postgresql/data
volumes:
dbtestvolume:
And then run the tests with the command:
docker-compose -f docker-compose.yml -f docker-compose-test.yml run --rm app
that as far as I understand should override only the different aspects compared to the docker-file used for development, that is the command used and the data volume where the data is stored.
The command is successfully overridden, while unfortunately the data volume stays the same and so the data of my application get overwritten if I run my tests.
Is this the correct way to set up a docker configuration for the tests? Any suggestion about what is going wrong?
If this is not the correct way, what is the proper way to setup a docker-compose configuration for testing?
Alternative test
I tried to change my docker-compose-test.yml file to use a different service (db-test) for testing:
version: '2'
services:
app:
command: pytest
depends_on:
- db-test
db-test:
image: postgres:9.6-alpine
restart: always
environment:
POSTGRES_DB: app
POSTGRES_USER: app
POSTGRES_PASSWORD: app
What happens now is that I have data is not overwritten (so, in a way, it works, hurray!) when a run my tests, but if I try to run the command:
docker-compose down
I get this ouput:
Stopping app_app_1 ... done
Stopping app_db_1 ... done
Found orphan containers (app_db-test_1) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
and then the docker-compose down fails. So something is not configured properly.
Any idea?

If you don't want to persist the DB data, don't use volumes, so you will have a fresh database everytime you start the container.
I guess you need some prepopulated data in your tables, so just build a new DB image copying the data you need. The Docker file could be something like:
FROM postgres:9.6-alpine
COPY db-data/ /var/lib/postgresql/data
In case you need to update the data, mount the db-data/ using -v, change it and rebuild the image.
BTW, it would be better to use an automated pipeline to test your builds, using Jenkins, GitLab CI, Travis or whatever solution that suits you. Anyway, you can use docker-compose in your pipeline as well to keep it consistent with your local development environment.

Slow django model instance creation with Docker

I have django application with some model. I have manage.py command that creates n models and saves it to db. It runs with decent speed on my host machine.
But if I run it in docker it runs very slow, 1 instance created and saved in 40-50 seconds. I think I am missing something on how Docker works, can somebody point out why performance is low and what can i do with it?
docker-compose.yml:
version: '2'
services:
db:
restart: always
image: "postgres:9.6"
ports:
- "5432:5432"
volumes:
- /usr/local/var/postgres:/var/lib/postgresql
environment:
- POSTGRES_PASSWORD=postgres
- POSTGRES_DB=my_db
- POSTGRES_USER=postgres
web:
build: .
command: bash -c "./wait-for-it.sh db:5432 --timeout=15; python manage.py migrate; python manage.py runserver 0.0.0.0:8000; python manage.py mock 5"
ports:
- "8000:8000"
expose:
- "8000"
depends_on:
- db
dockerfile for web service:
FROM python:3.6
ENV PYTHONBUFFERED 1
ADD . .
WORKDIR .
RUN pip install -r requirements.txt
RUN chmod +x wait-for-it.sh

The problem here is most likely the volume /usr/local/var/postgres:/var/lib/postgresql as you are using it on Mac. As I understand the Docker for Mac solution, it uses file sharing to implement host volumes, which is a lot slower then native filesystem access.
A possible workaround is to use a docker volume instead of a host volume. Here is an example:
version: '2'
volumes:
postgres_data:
services:
db:
restart: always
image: "postgres:9.6"
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql
environment:
- POSTGRES_PASSWORD=postgres
- POSTGRES_DB=my_db
- POSTGRES_USER=postgres
web:
build: .
command: bash -c "./wait-for-it.sh db:5432 --timeout=15; python manage.py migrate; python manage.py runserver 0.0.0.0:8000; python manage.py mock 5"
ports:
- "8000:8000"
expose:
- "8000"
depends_on:
- db
Please note that this may complicate management of the postgres data, as you can't simply access the data from your Mac. You can only use the docker CLI or containers to access, modify and backup this data. Also, I'm not sure what happens if you uninstall Docker from your Mac, it may be that you lose this data.

Two things, can be a probable cause:
Starting of docker container takes some time, so if you start new container for each instance this can add up.
What storage driver do you use? Docker (often) defaults to device mapper loopback storage driver, which is slow. Here is some context. This will be painfull especially if you start this container often.
Other than that your config looks sensibly, and there are no obvious causes problems there. So if the above two points don't apply to you, please add some extra comments --- like how you actually add these model instances.

Python script to add data to postgres docker container runs multiple times

I'm trying to find a good way to populate a database with initial data for a simple application. I'm using a tutorial from realpython.com as a starting point. I then run a simple python script after the database is created to add a single entry, but when I do this the data is added multiple times even though I only call the script once. result
population script (test.py):
from app import db
from models import *
t = Post("Hello 3")
db.session.add(t)
db.session.commit()
edit:
Here is the docker-compose file which i use to build the project:
web:
restart: always
build: ./web
expose:
- "8000"
links:
- postgres:postgres
volumes:
- /usr/src/app/static
env_file: .env
command: /usr/local/bin/gunicorn -w 2 -b :8000 app:app
nginx:
restart: always
build: ./nginx/
ports:
- "80:80"
volumes:
- /www/static
volumes_from:
- web
links:
- web:web
data:
restart: always
image: postgres:latest
volumes:
- /var/lib/postgresql
command: "true"
postgres:
restart: always
image: postgres:latest
volumes_from:
- data
ports:
- "5432:5432"
it references two different Dockerfiles:
Dockerfile #1 which builds the App container and is 1 line:
FROM python:3.4-onbuild
Dockerfile #2 is used to build the nginx container
FROM tutum/nginx
RUN rm /etc/nginx/sites-enabled/default
ADD sites-enabled/ /etc/nginx/sites-enabled
edit2:
Some people have suggested that the data was persisting over several runs, and that was my initial thought as well. This is not the case, as I remove all active docker containers via docker rm before testing. Also the number of "extra" data is not consistent, ranging randomly from 3-6 in the few tests that I have run so far.

It turns out this is a bug related to using the run command on containers with the "restart: always" instruction in the docker-compose/Dockerfile. In order to resolve this issue without a bug fix I removed the "restart: always" from the web container.
related issue: https://github.com/docker/compose/issues/1013

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.