Updating Gitlab repo file using that repo's pipeline

Updating Gitlab repo file using that repo's pipeline - python

I have a Python app that takes the value of a certificate in a Dockerfile and updates it. However, I'm having difficulty knowing how to get the app to work within Gitlab.
When I push the app with the Dockerfile to be updated I want the app to run in the Gitlab pipeline and update the Dockerfile. I'm a little stuck on how to do this. I'm thinking that I would need to pull the repo, run the app and then push back up.
Would like some advice on if this is the right approach and if so how I would go about doing so?
This is just an example of the Dockerfile to be updated (I know this image wouldn't actually work, but the app would only update the ca-certificate present in the DF:
#syntax=docker/dockerfile:1
#init the base image
FROM alpine:3.15
#define present working directory
#WORKDIR /library
#run pip to install the dependencies of the flask app
RUN apk add -u \
ca-certificates=20211220 \
git=3.10
#copy all files in our current directory into the image
COPY . /library
EXPOSE 5000
#define command to start the container, need to make app visible externally by specifying host 0.0.0.0
CMD [ "python3", "-m", "flask", "run", "--host=0.0.0.0"]
gitlab-ci.yml:
stages:
- build
- test
- update_certificate
variables:
PYTHON_IMG: "python:3.10"
pytest_installation:
image: $PYTHON_IMG
stage: build
script:
- pip install pytest
- pytest --version
python_requirements_installation:
image: $PYTHON_IMG
stage: build
script:
- pip install -r requirements.txt
unit_test:
image: $PYTHON_IMG
stage: test
script:
- pytest ./tests/test_automated_cert_checker.py
cert_updater:
image: $PYTHON_IMG
stage: update_certificate
script:
- pip install -r requirements.txt
- python3 automated_cert_updater.py
I'm aware there's a lot of repetition with installing the requirements multiple times and that this is an area for improvement. I doesn't feel like it's necessary for the app to be built into an image because it's only used for updating the DF.
requirements.txt installs pytest and BeautifulSoup4
Additional context: The pipeline that builds the Dockerimage already exists and builds successfully. I am looking for a way to run this app once a day which will check if the ca-certificate is still up to date. If it isn't then the app is run, the ca-certificate in the Dockerfile is updated and then the updated Dockerfile is re built automatically.
My thoughts are that I may need to set the gitlab-ci.yml up pull the repo, run the app (that updates the ca-certificate) and then re push it, so that a new image is built based upon the update to the certificate.
The Dockerfile shown here is just a basic example showing that the actual DF in the repo looks like.

What you probably want to do is identify the appropriate version before you build the Dockerfile. Then, pass a --build-arg with the ca-certificates version. That way, if the arg changes, then the cached layer becomes invalid and will install the new version. But if the version is the same, the cached layer would be used.
FROM alpine:3.15
ARG CA_CERT_VERSION
RUN apk add -u \
ca-certificates=$CA_CERT_VERSION \
git=3.10
# ...
Then when you build your image, you should figure out the appropriate ca-certificates version and pass it as a build-arg.
Something like:
version="$(python3 ./get-cacertversion.py)" # you implement this
docker build --build-arg CA_CERT_VERSION=$version -t myimage .
Be sure to add appropriate bits to leverage docker caching in GitLab.

Related

DockerFile for a Python script with Firefox-based Selenium web driver, Flask & BS4 dependencies

Super new to python, and never used docker before. I want to host my python script on Google Cloud Run but need to package into a Docker container to submit to google.
What exactly needs to go in this DockerFile to upload to google?
Current info:
Python: v3.9.1
Flask: v1.1.2
Selenium Web Driver: v3.141.0
Firefox Geckodriver: v0.28.0
Beautifulsoup4: v4.9.3
Pandas: v1.2.0
Let me know if further information about the script is required.
I have found the following snippets of code to use as a starting point from here. I just don't know how to adjust to fit my specifications, nor do I know what 'gunicorn' is used for.
# Use the official Python image.
# https://hub.docker.com/_/python
FROM python:3.7
# Install manually all the missing libraries
RUN apt-get update
RUN apt-get install -y gconf-service libasound2 libatk1.0-0 libcairo2 libcups2 libfontconfig1 libgdk-pixbuf2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libxss1 fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils
# Install Chrome
RUN wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
RUN dpkg -i google-chrome-stable_current_amd64.deb; apt-get -fy install
# Install Python dependencies.
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
# Copy local code to the container image.
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . .
# Run the web service on container startup. Here we use the gunicorn
# webserver, with one worker process and 8 threads.
# For environments with multiple CPU cores, increase the number of workers
# to be equal to the cores available.
CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 main:app
# requirements.txt
Flask==1.0.2
gunicorn==19.9.0
selenium==3.141.0
chromedriver-binary==77.0.3865.40.0

Gunicorn is an application server for running your python application instance, it is a pure-Python HTTP server for WSGI applications. It allows you to run any Python application concurrently by running multiple Python processes within a single dyno.
Please have a look into the following Tutorial which explains in detail regarding gunicorn.
Regarding Cloud Run, to deploy to Cloud Run, please follow next steps or the Cloud Run Official Documentation:
1) Create a folder
2) In that folder, create a file named main.py and write your Flask code
Example of simple Flask code
import os
from flask import Flask
app = Flask(__name__)
#app.route("/")
def hello_world():
name = os.environ.get("NAME", "World")
return "Hello {}!".format(name)
if __name__ == "__main__":
app.run(debug=True, host="0.0.0.0", port=int(os.environ.get("PORT", 8080)))
3) Now your app is finished and ready to be containerized and uploaded to Container Registry
3.1) So to containerize your app, you need a Dockerfile in the same directory as the source files (main.py)
3.2) Now build your container image using Cloud Build, run the following command from the directory containing the Dockerfile:
gcloud builds submit --tag gcr.io/PROJECT-ID/FOLDER_NAME
where PROJECT-ID is your GCP project ID. You can get it by running gcloud config get-value project
4) Finally you can deploy to Cloud Run by executing the following command:
gcloud run deploy --image gcr.io/PROJECT-ID/FOLDER_NAME --platform managed
You can also have a look into the Google Cloud Run Official GitHub Repository for a Cloud Run Hello World Sample.

Can't deploy container image to lambda function

I try to deploy container image to lambda function, but this error message appear
The image manifest or layer media type for the source image <image_source> is not supported.
here is my Dockerfile, i believe i have use the proper setup
FROM public.ecr.aws/lambda/python:3.8
# Install dependencies
COPY requirements.txt ./
RUN pip install -r requirements.txt
# Copy function code
COPY app/* ./
# Set the CMD to your handler
CMD [ "lambda_function.lambda_handler" ]

Try by specifying the target platform of the image you build as amd64:
docker build --platform linux/amd64 . -t my_image.
I get the same error while trying to deploy a lambda based on an image that supports both linux/amd64 and linux/arm64/v8 (Apple Silicon) architectures.

If you are using buildx >= 0.10 specifying target platform does not work since it also creates multi-platform index by default.
To fix this problem set --provenance=false to docker build.
For more details please see: https://github.com/docker/buildx/issues/1509#issuecomment-1378538197

Using docker to compose a remote image with a local code base for development

I have been reading this tutorial:
https://prakhar.me/docker-curriculum/
along with other tutorials, and Docker docks and I am still not completely clear on how to do this task.
The problem
My local machine is running Mac OS X, and I would like to set up a development environment for a python project. In this project I need to run call an api from a docker repo bamos/openface. The project also has some dependencies such as yaml, etc. If I just mount my local to openface, ie:
docker run -v path/to/project:/root/project -p 9000:9000 -p 8000:8000 -t -i bamos/openface /bin/bash
Then I need to install yaml and other dependencies, and every time I exit the container the installations would be lost. Additionally, it is also much slower for some reason. So the right way to do this is using Docker compose, but I am not sure how to proceed from here.
UPDATE
In response to the comments, I will now update the problem:
Right now my Dockerfile looks like this:
FROM continuumio/anaconda
ADD . /face-off
WORKDIR /face-off
RUN pip install -r requirements.txt
EXPOSE 5000
CMD [ "python", "app.py" ]
It is important that I build from anaconda since a lot of my code will use numpy and scipy. Now I also need bamos/openface, so I tried adding that to my docker-compose.yml file:
version: '2'
services:
web:
build: .
command: python app.py
ports:
- "5000:5000"
volumes:
- .:/face-off
openface:
build: bamos/openface
However, I am getting error:
build path path/to/face-off/bamos/openface either does not exist, is not accessible, or is not a valid URL
So I need to pass bamos/openface the right way so I can build a container with it. Right now bamos/openface is listed when I do docker images.

Using GitLab CI with the Python 'onbuild' image, the packages in requirements.txt don't seem to get installed

I'm trying to familiarize myself with the Gitlab CI environment with a test project, https://gitlab.com/khpeek/CI-test. The project has the following .gitlab-ci.yml:
image: python:2.7-onbuild
services:
- rethinkdb:latest
test_job:
script:
- pytest
The problem is that the test_job job in the CI pipeline fails with the following error message:
Running with gitlab-ci-multi-runner 9.0.1 (a3da309)
on docker-auto-scale (e11ae361)
Using Docker executor with image python:2.7-onbuild ...
Starting service rethinkdb:latest ...
Pulling docker image rethinkdb:latest ...
Using docker image rethinkdb:latest ID=sha256:23ecfb08823bc5483c6a955b077a9bc82899a0df2f33899b64992345256f22dd for service rethinkdb...
Waiting for services to be up and running...
Using docker image sha256:aaecf574604a31dd49a9d4151b11739837e4469df1cf7b558787048ce4ba81aa ID=sha256:aaecf574604a31dd49a9d4151b11739837e4469df1cf7b558787048ce4ba81aa for predefined container...
Pulling docker image python:2.7-onbuild ...
Using docker image python:2.7-onbuild ID=sha256:5754a7fac135b9cae7e02e34cc7ba941f03a33fb00cf31f12fbb71b8d389ece2 for build container...
Running on runner-e11ae361-project-3083420-concurrent-0 via runner-e11ae361-machine-1491819341-82630004-digital-ocean-2gb...
Cloning repository...
Cloning into '/builds/khpeek/CI-test'...
Checking out d0937f33 as master...
Skipping Git submodules setup
$ pytest
/bin/bash: line 56: pytest: command not found
ERROR: Job failed: exit code 1
However, there is a requirements.txt in the repository with the single line pytest==3.0.7 in it. It seems to me from the Dockerfile of the python:2.7-onbuild image, however, that pip install -r requirements.txt should get run on build. So why is pytest not found?

If you look at the Dockerfile you linked to, you'll see pip install -r requirements.txt is part of an onbuild command. This is useful if you want to create a new container from that first one and install a bunch of requirements. The pip install -r requirements.txt command is therefore not executed within the container in your CI pipeline and if it were, it would be executed at the very beginning, even before your gitlab repository was cloned.
I would suggest you modify your .gitlab-ci.yml file this way
image: python:2.7-onbuild
services:
- rethinkdb:latest
test_job:
script:
- pip install -r requirements.txt
- pytest

The problem seems to be intermittent: although the first time it took 61 minutes to run the tests (which initially failed), now it takes about a minute (see screen grab below).
For reference, the testing repository is at https://gitlab.com/khpeek/CI-test. (I had to add a before_script with some pip installs to make the job succeed).

What is a good way to add python dependencies to a Docker container?

I am trying to integrate docker in to my django workflow and I have everything set up except one really annoying issue. If I want to add dependencies to my requirements.txt file I basically just have to rebuild the entire container image for those dependencies to stick.
For example, I followed the docker-compose example for django here. the yaml file is set up like this:
db:
image: postgres
web:
build: .
command: python manage.py runserver 0.0.0.0:8000
volumes:
- .:/code
ports:
- "8000:8000"
links:
- db
and the Docker file used to build the web container is set up like this:
FROM python:2.7
ENV PYTHONUNBUFFERED 1
RUN mkdir /code
WORKDIR /code
ADD requirements.txt /code/
RUN pip install -r requirements.txt
ADD . /code/
So when the image is built for this container requirements.txt is installed with whatever dependencies are initially in it.
If I am using this as my development environment it becomes very difficult to add any new dependencies to that requirements.txt file because I will have to rebuild the container for the changes in requirements.txt to be installed.
Is there some sort of best practice out there in the django community to deal with this? If not, I would say that docker looks very nice for packaging up an app once it is complete, but is not very good to use as a development environment. It takes a long time to rebuild the container so a lot of time is wasted.
I appreciate any insight . Thanks.

You could mount requirements.txt as a volume when using docker run (untested, but you get the gist):
docker run container:tag -v /code/requirements.txt ./requirements.txt
Then you could bundle a script with your container which will run pip install -r requirements.txt before starting your application, and use that as your ENTRYPOINT. I love the custom entrypoint script approach, it lets me do a little extra work without needing to make a new container.
That said, if you're changing your dependencies, you're probably changing your application and you should probably make a new container and tag it with a later version, no? :)

So I changed the yaml file to this:
db:
image: postgres
web:
build: .
command: sh startup.sh
volumes:
- .:/code
ports:
- "8000:8000"
links:
- db
I made a simple shell script startup.sh:
#!/bin/bash
#restart this script as root, if not already root
[ `whoami` = root ] || exec sudo $0 $*
pip install -r dev-requirements.txt
python manage.py runserver 0.0.0.0:8000
and then made a dev-requirements.txt that is installed by the above shell script as sort of a dependency staging environment.
when I am satisfied with a dependency in dev-requirements.txt I will just move it over to the requirements.txt to be committed to the next build of the image. This gives me flexibility to play with adding and removing dependencies while developing.

I think the best way is to ignore what's currently the most common way to install python dependencies (pip install -r requirements.txt) and specify your requirements directly in the Dockerfile, effectively getting rid of the requirements.txt file. Additionally you get dockers layer caching for free.
FROM python:2.7
ENV PYTHONUNBUFFERED 1
RUN mkdir /code
WORKDIR /code
# make sure you install requirements before the ADD, since everything after ADD is not cached
RUN pip install flask==0.10.1
RUN pip install sqlalchemy==1.0.6
...
ADD . /code/
If the docker container is the only way your application is ever run, then I would suggest you do it this way. If you want to support other means of setting up your code (e.g. virtualenv) then this is of course not for you and you should fall back to either using a requirements file or use a setup.py routine. Either way, I found this way to be most simple and straightforward without dealing with all the messed up python package distribution issues.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.