how to provide credentials in apache beam python programmatically? - python

We are using apache beam through airflow. Default GCS account is set with environmental variable - GOOGLE_APPLICATION_CREDENTIALS. We don't want to change environmental variable as it might affect other processes running at that time. I couldn't find a way to change Google Cloud Dataflow Service Account programmatically.
We are creating pipeline in following way
p = beam.Pipeline(argv=self.conf)
Is there any option through argv or options, where in I can mention the location of gcs credential file?
Searched through documentation, but didn't find much information.

You can specify a service account when you launch the job with a basic flag:
--serviceAccount=my-service-account-name#my-project.iam.gserviceaccount.com
That account will need the Dataflow Worker role attached plus whatever else you would like(GCS/BQ/Etc). Details here. You don't need the SA to be stored in GCS, or keys locally to use it.

Related

Specify GOOGLE APPLICATION CREDENTIALS in Airflow

So I am trying to orchestrate a workflow in Airflow. One task is to read GCP Cloud Storage, which needs me to specify the Google Application Credentials.
I decided to create a new folder in the dag folder and put the JSON key. Then I specified this in the dag.py file;
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "dags\support\keys\key.json"
Unfortunately, I am getting this error below;
google.auth.exceptions.DefaultCredentialsError: File dags\support\keys\dummy-surveillance-project-6915f229d012.json was not found
Can anyone help with how I should go about declaring the service account key?
Thank you.
You can create a connection to Google Cloud from Airflow webserver admin menu. In this menu you can pass the Service Account key file path.
In this picture, the keyfile Path is /usr/local/airflow/dags/gcp.json.
Beforehand you need to mount your key file as a volume in your Docker container with the previous path.
You can also directly copy the key json content in the Airflow connection, in the keyfile Json field :
You can check from these following links :
Airflow-connections
Airflow-with-google-cloud
Airflow-composer-managing-connections
If you trying to download data from Google Cloud Storage using Airflow, you should use the GCSToLocalFilesystemOperator operator described here. It is already provided as part of the standard Airflow library (if you installed it) so you don't have to write the code yourself using the Python operator.
Also, if you use this operator you can enter the GCP credentials into the connections screen (where it should be). This is a better approach to putting your credentials in a folder with your DAGs as this could lead to your credentials being committed into your version control system which could lead to security issues.

Unit test app using local JSON for GCP authentication?

I want to Unit test a some FastAPI API endpoints which utilizes Google Cloud Platfrom, I want to write the test without using os.environ["GOOGLE_APPLICATION_CREDENTIALS"]='path_to_json_file.json' in the files to authenticate (as this service will be in the cloud soon). Is there a way to mock this?
It's slightly unclear from your question but it is unlikely that you would ever want to set GOOGLE_APPLICATION_CREDENTIALS from within your code, partly for this reason.
You should set the variable from the environment:
GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/key.json
python3 your_code.py
Application Default Credentials (ADC) looks for credentials in 3 locations:
GOOGLE_APPLICATION_DEFAULT environment variable
gcloud application-default login
The compute service's identity
For this reason, setting the variable explicitly in code, overrides the possibility of #2 (less important) and #3 (more important).
If you set the variable outside of the code when you run the code for testing etc., the credentials will be found automatically and the code will be auth'd.
When you don't set the variable because the code is running on a compute service (e.g. Cloud Run, Compute Engine ...), the service's credentials will be used automatically by ADC and the code will be auth'd.

Retrieve Kubernetes' Secret from Google Composer (Airflow)

I have an Apache Airflow working on Kubernetes (via Google Composer). I want to retrieve one variable store in Secret:
I need to consume the variables stored in this Secret from a DAG in Airflow.(python).
The variables are stored as "Environment Vars", so in Python is quite easy:
import os
os.environ['DB_USER'])

Google Cloud Platform quotas monitoring (alerting)

I have a question about quotas monitoring (I mean quotas in https://console.cloud.google.com -> IAM & admin -> Quotas).
I need configure alerting on cases, when the capacity of quota for any Service is less than 20%, for example. Has anybody done something like that? Maybe Google Cloud has some standard tools for that? If not, is it possible to do with python + gcloud module?
If you are interested in Compute Engine quotas, there is a Google Cloud standard tool to list them using this API call. Or find here the CLI command used to list them, in yaml format:
gcloud compute project-info describe --project myproject
You can use a cron job to perform a regular scheduled task, calling the API and verifying that the usage/limit<0.8 condition is met.

Create service principal programmatically in Azure Python API

How can I, using the Azure Python API, create a full set of credentials that can later be used to start and deallocate all VMs in a named resource group, without any other permissions?
I have thoroughly researched the example code and both official and unofficial documentation, but I don't even know where to start...
I know I will need a tenant ID, client ID, client secret and subscription ID. Which of those can I make using an API, and how would I go about assigning roles to allow for starting/deallocating VMs of an existing resource group?
Sample code highly sought after, but will take any hint!
You need the azure-graphrbac package to create a Service Principal:
https://learn.microsoft.com/python/api/overview/azure/activedirectory
The closer to a sample might be this unittest:
https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/graphrbac/azure-graphrbac/tests/test_graphrbac.py
For role and permissions, you need azure-mgmt-authorization:
https://learn.microsoft.com/python/api/overview/azure/authorization
Best sample for this one, is probably the sub-part of this sample:
https://github.com/Azure-Samples/compute-python-msi-vm#role-assignement-to-the-msi-credentials
"msi_identity" is a synonym of "service principal" in your context.
Note that all of this is supported by the CLI v2.0:
https://learn.microsoft.com/cli/azure/ad/sp
https://learn.microsoft.com/cli/azure/role/assignment
It might be interested to test the CLI in --debug mode and sniffing in the code repo at the same time:
https://github.com/Azure/azure-cli
(full disclosure, I work at MS in the Azure SDK for Python team)

Categories