Passing azure secret variables to pytest in pipeline? - python

We are running integration tests, written in Python, in Azure Pipeline. These tests access a database, and the credentials for accessing the database are stored in a variable group in Azure, including secret variables. This is the part of the yaml file, where the integration tests are started:
jobs:
- job: IntegrationTests
variables:
- group: <some_variable_group>
- script: |
pdm run pytest \
--variables "$VARIABLE_FILE" \
--test-run-title="$TEST_TITLE" \
--napoleon-docstrings \
--doctest-modules \
--color=yes \
--junitxml=junit/test-results.xml \
integration
env:
DB_USER: $(SMDB_USER)
DB_PASSWORD: $(SMDB_PASSWORD)
DB_HOST: $(SMDB_HOST)
DB_DATABASE: $(SMDB_DATABASE)
The problem is, that we cannot read the value of SMDB_PASSWORD, as it is a secret variable. In order to use the secret variables, it is advised to use arguments in a PythonScript task (like here: Passing arguments to python script in Azure Devops)
but i am not aware how to modify this script to be defines PythonScript, as it includes using pdm.

actually according to docs they should be available as env variables: https://learn.microsoft.com/en-us/azure/devops/pipelines/process/set-secret-variables?view=azure-devops&tabs=yaml%2Cbash#use-a-secret-variable-in-the-ui
environ.get('DB_USER')
edit: repro:
python -c "import os, base64; print(base64.b64encode(bytes(os.environ.get('TEST_PLAIN'), 'ascii')))"

Related

GCP Dataflow custom template creation

I am trying to create a custom template in dataflow, so that I can use a DAG from composer to run on a set frequency. I have understood that I need to deploy my Dataflow template first, and then the DAG.
I have used this example - https://cloud.google.com/dataflow/docs/guides/templates/creating-templates#:~:text=The%20following%20example%20shows%20how%20to%20stage%20a%20template%20file%3A
My code:
- python3 -m job.process_file \
--runner DataflowRunner \
--project project \
--staging_location gs://bucketforjob/staging \
--temp_location gs://bucketforjob/temp \
--template_location gs://bucketfordataflow/templates/df_job_template.py \
--region eu-west2 \--output_bucket gs://cleanfilebucket \
--output_name file_clean \
--input gs://rawfilebucket/file_raw.csv
The issue I am having is, it just trys to run the pipeline (the input file doesn't exist in the bucket yet, and I don't want it to randomly process it by putting it in there), so it fails saying that file_raw.csv doesn't exist in bucket. how do I get it just to create/compile the pipeline as a template in the template folder for me to call on with my dag?
This is really confusing me, and there seems to be little guidance out there from what I could find... any help would be appreciated.
I think you would like to separate commands for a template creation from the job execution.
An example on a page you provided, depicts necessary parameters...
python -m examples.mymodule
--runner DataflowRunner
--project PROJECT_ID
--staging_location gs://BUCKET_NAME/staging
--temp_location gs://BUCKET_NAME/temp
--template_location gs://BUCKET_NAME/templates/TEMPLATE_NAME
--region REGION
where examples.mymodule - is the source code (as I understand), and --template_location gs://BUCKET_NAME/templates/TEMPLATE_NAME - is the place, where the result template is to be stored.
In order to execute the job, you might like to run a command according to the Running classic templates using gcloud documentation example...
gcloud dataflow jobs run JOB_NAME
--gcs-location gs://YOUR_BUCKET_NAME/templates/MyTemplate
--parameters inputFile=gs://YOUR_BUCKET_NAME/input/my_input.txt,outputFile=gs://YOUR_BUCKET_NAME/output/my_output
Or, in your case, you probably would like to start the job Using the REST API
In any case - don't forget about relevant IAM roles and permissions for service accounts, under which the job is to run.

Cloud Build env variables not passed to Django app on GAE

I have a Django app running on Google AppEngine Standard environment. I've set up a cloud build trigger from my master branch in Github to run the following steps:
steps:
- name: 'python:3.7'
entrypoint: python3
args: ['-m', 'pip', 'install', '--target', '.', '--requirement', 'requirements.txt']
- name: 'python:3.7'
entrypoint: python3
args: ['./manage.py', 'collectstatic', '--noinput']
- name: 'gcr.io/cloud-builders/gcloud'
args: ['app', 'deploy', 'app.yaml']
env:
- 'SHORT_SHA=$SHORT_SHA'
- 'TAG_NAME=$TAG_NAME'
I can see under the Execution Details tab on Cloud Build that the variables were actually set.
The problem is, SHORT_SHA and TAG_NAME aren't accessible from my Django app (followed instructions at https://cloud.google.com/cloud-build/docs/configuring-builds/substitute-variable-values#using_user-defined_substitutions)! But if I set them in my app.yaml file with hardcoded values under env_variables, then my Django app can access those hardcoded values (and the values set in my build don't overwrite those hardcoded in app.yaml).
Why is this? Am I accessing them/setting them incorrectly? Should I be setting them in app.yaml somehow?
I even printed the whole os.environ dictionary in one of my views to see if they were just there with different names or something, but they're not present in there.
Not the cleanest solution, but I used this medium post as a guidance to my solution. I hypothesize that runserver command isn't being passed those env variables, and that those variables are only used for the app deploy command.
Write a Python script to dump the current environment variables in a .env file in project dir
In your settings file, read env variables from the .env file (I used django-environ library for this)
Add a step to cloud build file that runs your new Python script and pass env variables in that step (you're essentially dumping these variables into a .env file in this step)
- name: 'python:3.7'
entrypoint: python3
args: ['./create_env_file.py']
env:
- 'SHORT_SHA=$SHORT_SHA'
- 'TAG_NAME=$TAG_NAME'
Set the variables through Substitution Variables section in Edit Trigger page in Cloud Build
Now your application should have these env variables when app deploy happens

Is it possible to run / serialize Dataflow job without having all dependencies locally?

I have created a pipeline for Google Cloud Dataflow using Apache Beam, but I cannot have Python dependencies locally. However, there are no problems for those dependencies to be installed remotely.
Is it somehow possible to run the job or create a template without executing Python code in my local (development) environment?
Take a look at this tutorial. Basically, you write the python pipeline, then deploy it via command line with
python your_pipeline.py \
--project $YOUR_GCP_PROJECT \
--runner DataflowRunner \
--temp_location $WORK_DIR/beam-temp \
--setup_file ./setup.py \
--work-dir $WORK_DIR
The crucial part is --runner DataflowRunner, so it uses Google Dataflow (and not your local installation) to run the pipeline. Obviously, you have to set your Google account and credentials.
Well, I am not 100% sure that this is possible, but you may:
Define a requirements.txt file with all of the dependencies for pipeline execution
Avoid importing and using your dependencies in pipeline-construction time, only in execution time code.
So, for instance, your file may look like this:
import apache_beam as beam
with beam.Pipeline(...) as p:
result = (p | ReadSomeData(...)
| beam.ParDo(MyForbiddenDependencyDoFn()))
And in the same file, your DoFn would be importing your dependency from within the pipeline execution-time code, so the process method, for instance. See:
class MyForbiddenDependencyDoFn(beam.DoFn):
def process(self, element):
import forbidden_dependency as fd
yield fd.totally_cool_operation(element)
When you execute your pipeline, you can do:
python your_pipeline.py \
--project $GCP_PROJECT \
--runner DataflowRunner \
--temp_location $GCS_LOCATION/temp \
--requirements_file=requirements.txt
I have never tried this, but it just may work : )

How to test dockerized flask app using Pytest?

I've built the flask app that designed to run inside docker container. It will
accept POST HTTP Methods and return appropriate JSON response if the header key matched with the key that I put inside docker-compose environment.
...
environment:
- SECRET_KEY=fakekey123
...
The problem is: when it comes to testing. The app or the client fixture of
flask (pytest) of course can't find the docker-compose environment. Cause the app didn't start from docker-compose but from pytest.
secret_key = os.environ.get("SECRET_KEY")
# ^^ the key loaded to OS env by docker-compose
post_key = headers.get("X-Secret-Key")
...
if post_key == secret_key:
RETURN APPROPRIATE RESPONSE
.....
What is the (best/recommended) approach to this problem?
I find some plugins
one,
two,
three to do this. But I asked here if there is any more "simple"/"common" approach. Cause I also want to automate this test using CI/CD tools.
You most likely need to run py.test from inside of your container. If you are running locally, then there's going to be a conflict between what your host machine is seeing and what your container is seeing.
So option #1 would be to using docker exec:
$ docker exec -it $containerid py.test
Then option #2 would be to create a script or task in your setup.py so that you can run a simpler command like:
$ python setup.py test
My current solution is to mock the function that read the OS environment. OS ENV is loaded if the app started using docker. In order to make it easy for the test, I just mock that function.
def fake_secret_key(self):
return "ffakefake11"
def test_app(self, client):
app.secret_key = self.fake_secret_key
# ^^ real func ^^ fake func
Or another alternative is using pytest-env as #bufh suggested in comment.
Create pytest.ini file, then put:
[pytest]
env =
APP_KEY=ffakefake11

Using docker environment -e variable in supervisor

I've been trying to pass in an environment variable to a Docker container via the -e option. The variable is meant to be used in a supervisor script within the container. Unfortunately, the variable does not get resolved (i.e. they stay for instance$INSTANCENAME). I tried ${var} and "${var}", but this didn't help either. Is there anything I can do or is this just not possible?
The docker run command:
sudo docker run -d -e "INSTANCENAME=instance-1" -e "FOO=2" -v /var/app/tmp:/var/app/tmp -t myrepos/app:tag
and the supervisor file:
[program:app]
command=python test.py --param1=$FOO
stderr_logfile=/var/app/log/$INSTANCENAME.log
directory=/var/app
autostart=true
The variable is being passed to your container, but supervisor doesn't let use environment variables like this inside the configuration files.
You should review the supervisor documentation, and specifically the parts about string expressions. For example, for the command option:
Note that the value of command may include Python string expressions, e.g. /path/to/programname --port=80%(process_num)02d might expand to /path/to/programname --port=8000 at runtime.
String expressions are evaluated against a dictionary containing the keys group_name, host_node_name, process_num, program_name, here (the directory of the supervisord config file), and all supervisord’s environment variables prefixed with ENV_.

Categories