Access file on google drive using Python Script - python

Please can anyone help me with how to fetch file stored in google drive?
I've a VM Compute engine in GCP and associated service account. This service account have an access to the google drive folder.
I thought to use python script on VM while will access the File on GDrive.
Not sure how to do this.

I guess you can try to impersonate the service account you are using.
Attaching a service account to a resource
For some Google Cloud resources, you can specify a user-managed service account that the resource uses as its default identity. This process is known as attaching the service account to the resource, or associating the service account with the resource.
When a resource needs to access other Google Cloud services and resources, it impersonates the service account that is attached to itself. For example, if you attach a service account to a Compute Engine instance, and the applications on the instance use a client library to call Google Cloud APIs, those applications automatically impersonate the attached service account.
Let me know if this was helpful.

Related

Roles Required to write to Cloud Storage (GCP) from python (pandas)

I have a question for the GCP connoisseurs among you.
I have an issue that I can upload to a bucket via UI and gsutil - but if I try to do this via python
df.to_csv('gs://BUCKET_NAME/test.csv')
I get a 403 insufficient permission error.
My guess at the moment is that python does this via an API and requires an extra role - to make things more confusing I am already project owner of the project of the bucket and compared to other team members did not really find lacking permissions for this specific bucket.
I use python 3.9.1 via pyenv and pandas '1.4.2'
Anyone had the same issue/ knows what role I am missing?
I checked that I have in principal rights to upload both via UI and gsutil
I used the same virtual python environemnt to read and write from bigquery to check that I can in principle use GCP data in python - this works
I have the following Roles on the Bucket
Storage Admin, Storage Object Admin, Storage Object Creator, Storage Object Viewer
gsutil and gcloud share credentials.
These credentials are not shared with other code running locally.
The quick-fix but sub-optimal solution is to:
gcloud auth application-default login
And run the code again.
It will then use your gcloud (gsutil) user credentials configured to run as if you were using a Service Account.
These credentials are stored (on Linux) in ${HOME}/.config/gcloud/application_default_credentials.json.
A better solution is to create a Service Account specifically for your app and grant it the minimal set of IAM permissions that it will need (BigQuery, GCS, ...).
For testing purposes (!) you can download the Service Account key locally.
You can then auth your code using Google's Application Default Credentials (ADC) by (on Linux):
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/key.json
python3 your_app.py
When you deploy code that leverages ADC to a Google Cloud compute service (Compute Engine, Cloud Run, ...), it can be deployed unchanged because the credentials for the compute resource will be automatically obtained from the Metadata service.
You can Google e.g. "Google IAM BigQuery" to find the documentation that lists the roles:
IAM roles for BigQuery
IAM roles for Cloud Storage

Cloud Composer + Airflow: Setting up DAWs to trigger on HTTP (or should I use Cloud Functions?)

Ultimately, what I want to do is have a Python script that runs whenever a HTTP request is created, dynamically. It'd be like: App 1 runs and sends out a webhook, Python script catches the webhook immediately and does whatever it does.
I saw that you could do this in GCP with Composer and Airflow.
But I'm having several issues following these instrutions https://cloud.google.com/composer/docs/how-to/using/triggering-with-gcf:
Running this in Cloud Shell to grant blob signing permissions:
gcloud iam service-accounts add-iam-policy-binding
your-project-id#appspot.gserviceaccount.com
--member=serviceAccount:your-project-id#appspot.gserviceaccount.com
--role=roles/iam.serviceAccountTokenCreator
When I put in my project ID, I get a "Gaia id not found for your-project-id#appspot.gserviceaccount.com"
When I run the airflow_uri = environment_data['config']['airflowUri'] bit, I get a key error on 'config'.
Is there a better way to do what I'm trying to do (i.e. run Python scripts dynamically)?
The reason for getting: Gaia id not found for email <project-id>#appspot.gserviceaccount.com error is not enabling all needed APIs in your project. Please follow the steps:
Create or select Google Cloud Platform Project you wish to work with.
Enable the Cloud Composer, Google Cloud Functions and Cloud Identity and Google Identity and Access Management (IAM) APIs. You can find it in Menu -> Products -> Marketplace and typing the name of corresponding API.
Grant blob signing permissions to the Cloud Functions Service Account. In order for GCF to authenticate to Cloud IAP, the proxy that protects the Airflow webserver, you need to grant the Appspot Service Account GCF the Service Account Token Creator role. Do so by running the following command in your Cloud Shell, substituting the name of your project for <your-project-id>:
gcloud iam service-accounts add-iam-policy-binding \
<your-project-id>#appspot.gserviceaccount.com \
--member=serviceAccount:<your-project-id>#appspot.gserviceaccount.com \
--role=roles/iam.serviceAccountTokenCreator
I tested the scenario, firstly without enabling APIs and I've retrieved the same error as you. After enabling the APIs, error disappear and IAM policy has been updated correctly.
There is already well described Codelabs tutorial, which shows the workflow of triggering the DAG with Google Cloud Functions.

Best practices to store credentials in your Python script

My setup is: the code is in the private repository in Github which I run from AWS EC2.
I have this doubt where should I store the API and database credentials. My feeling at the moment is that no credentials should be stored in the code, instead, I should use the AWS Secret Manager to access them but then, you also connect to AWS. What is your view on it? A disclosure, I am starting with Python, so, please, be gentle.
Never store your secrets in code. In your case I would recommend AWS Secret Manager (Or secret parameters in AWS System Manager Parameter Store) and store your secrets there.
I would recommend to create an IAM role for your EC2 which has a policy which allows the role to read the correct secrets from AWS Secret Manager. Connect the role with an instance profile and the instance profile with the EC2. This is done automatically in the AWS console but not when your using CloudFormation. An instance profile is kind of a wrapper around a role that allows the role to be attached to an instance.
In this flow your EC2 instance will be allowed to read the secrets from system manager by using the instance profile and role. Roles are the recommended way to make AWS resources interact with each other because it uses temporary credentials and restricts access.
With the above setup you should be able to read the secrets from within your code like explained here. You can use boto3 (AWS SDK for Python) to interact from within the EC2 to the secrets manager.

Google OAuth2 Service Account Auth with Domain-wide Delegation Using Default Credentials

Have anyone been able to get domain-wide delegation working by using default credentials (i.e. an AppEngine default service account or otherwise derived from the GOOGLE_APPLICATION_CREDENTIALS environment variable) specifically with the Drive or Gmail API? We've been able to follow this guide to use default credentials with admin sdk APIs, but not with user centric APIs like Gmail/Drive. We really dislike the key management situation we're stuck in by deploying keys with code or loading them into GCS buckets while knowing that many GCP centric services don't have this problem (i.e. google-cloud-firestore or google-cloud-bigquery python clients).

Using Google Cloud Storage Files with Jupyter Notebook on Cloud Compute

I am working on a machine learning project and I just set up a google cloud account.
I have a VM instance up and running and Jupyter is working. I placed a couple of file folders on Google Cloud Storage assuming I could connect it to my VM and use the files in a Jupyter notebook running Python 3.
I have not been able to find a way to access the files in storage from my virtual machine. Someone help please!?
To access cloud storage from a VM, it needs the VM to have been created with the API access. When you initially create the VM, there are a number of options available under the cloud API scope section. Select the storage permission to give your VM access to cloud storage.
Now the VM has access to storage, you can use the gsutil command to access information directly from the cloud storage bucket using the name of the storage bucket.
You will also be able to extend the access of the storage bucket to colleagues should you wish by doing the above. Access permissions for the project can be controlled via IAM section of google cloud.

Categories