I'm having issues executing a Cloud Function on GCP which tries to update some google sheets of mine. I got this script working in Jupyter but have struggled to deploy it virtually as a Cloud Function. My issue seems to be authorizing the CF to post to google sheets.
I've tried many things over 6+ hours, most questions on stackoverflow, medium articles github but haven't been able to find a working solution for me. I don't think it's a roles/permissions issues. I understand how some of these may work when you are outside cloud functions but not inside of it.
Ultimately, I think from what I've seen the best way is to host my JSON secret key inside of a storage bucket and call that, I've tried this to no success and this does seem somewhat convoluted as everything is from a google service.
I've honestly gone back to my orignal code so am back to the first error which is simply that my JSON key cannot be found as when I was running it in Jupyter it was in the same directory...hence why I created a google storage bucket to try to link to.
import pandas as pd
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import google.cloud
from df2gspread import df2gspread as d2g
from df2gspread import gspread2df as g2d
import datetime
import time
import numpy as np
def myGet(event, context):
scope = ['https://spreadsheets.google.com/feeds', 'https://www.googleapis.com/auth/drive']
credentials = ServiceAccountCredentials.from_json_keyfile_name('my-key-name.json', scope)
gc = gspread.authorize(credentials)
spreadsheet_key = '--removed actual key/id--'
ERROR: File "/env/local/lib/python3.7/site-packages/oauth2client/service_account.py", line 219, in from_json_keyfile_name with open(filename, 'r') as file_obj: FileNotFoundError: [Errno 2] No such file or directory: 'my-key-name.json'
Thanks very much for any guidance and support on this. I have thouroughly looked and tried to solve this on my own. EDIT: Please keep in mind, this is not a .py file living in a directory, that's part of my issue, I don't know where to link to as its an isolated "Cloud Function" as far as I can tell.
Some links I've looked at in my 20+ attempts to fix this issue just to name a few:
How can I get access to Google Cloud Storage using an access and a secret key
Accessing google cloud storage bucket from cloud functions throws 500 error
https://cloud.google.com/docs/authentication/getting-started#auth-cloud-implicit-python
https://cloud.google.com/docs/authentication/getting-started#setting_the_environment_variable
UPDATE:
I realized you could upload a zip of your files to show three files in the inline editor. At the beginning I was not doing this so could not figure out where to put the JSON key. Now I have it viewable and need to figure out how to call it in the method.
When I do a test run of the CF, I get a non-descript error which doesn't show up in the logs and can't test it from the Cloud Schedular like I could previously. I found this on stack overflow and feel like I now need the same version but for python and figure out what calls to make from the google docs.
Cloud Functions: how to upload additional file for use in code?
My advice is to not use JSON key file in your Cloud Functions (and on all GCP product). With Cloud Function, like with other GCP product, you have the capability to load automatically the service account during your deployment.
The advantage of Cloud Function Identity is that you haven't a key file to store secretly, you don't have to rotate your key file for increasing the security, you don't have risk of leak of key file,...
By the way, use the default service account in your code.
If you need to get the credential object, you can use the oauth2 python library for this.
import google.auth
credentials, project_id = google.auth.default()
You'll need to specify a relative filename instead, e.g. ./my-key-name.json, assuming the file is in the same directory as your main.py file.
I had the same problem and solved it like this:
import google.auth
credentials, _ = google.auth.default()
gc = gspread.authorize(credentials)
That should work for you.
Related
I am trying to get Firestore working in emulator-mode with Python on my local Linux PC. Is it possible to use anonymous credentials for this so I don't have to create new credentials?
I have tried two methods of getting access to the Firestore database from within a Python Notebook, after having run firebase emulators:start from the command-line:
First method:
from firebase_admin import credentials, firestore, initialize_app
project_id = 'my_project_id'
cred = credentials.ApplicationDefault()
initialize_app(cred, {'projectId': project_id})
db = firestore.client()
This raises the following exception:
DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
Second method:
from google.auth.credentials import AnonymousCredentials
from google.cloud.firestore import Client
cred = AnonymousCredentials()
db = Client(project=project_id, credentials=cred)
This works, but then I try and access a document from the database that I have manually inserted into the database using the web-interface, using the following Python code:
doc = db.collection('my_collection').document('my_doc_id').get()
And then I get the following error, which perhaps indicates that the anonymous credentials don't work:
PermissionDenied: 403 Missing or insufficient permissions.
Thoughts
It is a surprisingly complicated system and although I have read numerous doc-pages, watched tutorial videos, etc., there seems to be an endless labyrinth of configurations that need to be setup in order for it to work. It would be nice if I could get Firestore working on my local PC with minimal effort, to see if the database would work for my application.
Thanks!
Method 2 works if an environment variable is set. In the following change localhost:8080 to the Firestore server address shown when the emulator is started using firebase emulators:start
import os
os.environ['FIRESTORE_EMULATOR_HOST'] = 'localhost:8080'
I don't know how to make it work with Method 1 above using the firebase_admin Python package. Perhaps someone else knows.
Also note that the emulated Firestore database will discard all its data when the server is shut down. To persist and reuse the data start the emulator with a command like this:
firebase emulators:start --import=./emulator_data --export-on-exit
I am testing out cloud function and I have things setup, but output is not populating correctly (the output is not being saved into Cloud Storage and my print statements are not populating). Here is my code and my requirements below. I have setup the Cloud Function to just run as a HTTP request trigger type with unauthenticated invocations and having a Runtime service account as a specified account that has write access to Cloud Storage. I have verified that I am calling the correct Entry point.
logs
2022-03-22T18:52:02.749482564Z test-example vczj9p85h5m2 Function execution started
2022-03-22T18:52:04.148507183Z test-example vczj9p85h5m2 Function execution took 1399 ms.
Finished with status code: 200
main.py
import requests
from google.cloud import storage
import json
def upload_to_gsc(data):
print("saving to cloud storage")
client = storage.Client(project="my-project-id")
bucket = client.bucket("my-bucket-name")
blob = bucket.blob("subfolder/name_of_file")
blob.upload_from_string(data)
print("data uploaded to cloud storage")
def get_pokemon(request):
url = "https://pokeapi.co/api/v2/pokemon?limit=100&offset=200"
data = requests.get(url).json()
output = [i.get("name") for i in data["results"]]
data = json.dumps(output)
upload_to_gsc(data=data)
print("saved data!")
requirements.txt
google-cloud-storage
requests==2.26.0
As #JackWotherspoon mentioned, be sure to make sure you double check your project-id,bucket-name and entry point if you have a case like I did. For myself, I recreated the Cloud Function and tested it and it worked again.
As #dko512 mentioned in comments, issue was resolved by recreating and redeploying the Cloud Function.
Posting the answer as community wiki for the benefit of the community that might encounter this use case in the future.
Feel free to edit this answer for additional information.
I was asked to preform integration with an external google storage bucket, I had received a credentials json,
And while trying to do
gsutil ls gs://bucket_name (after configuring myself with the creds json) I had received a valid response, as well as when I tried to upload a file into the bucket.
When trying to do it with Python3, it does not work:
While using google-cloud-storage==1.16.0 (tried also the newer versions), I'm doing:
project_id = credentials_dict.get("project_id")
credentials = service_account.Credentials.from_service_account_info(credentials_dict)
client = storage.Client(credentials=credentials, project=project_id)
bucket = client.get_bucket(bucket_name)
But on the get_bucket line, I get:
google.api_core.exceptions.Forbidden: 403 GET https://www.googleapis.com/storage/v1/b/BUCKET_NAME?projection=noAcl: USERNAME#PROJECT_ID.iam.gserviceaccount.com does not have storage.buckets.get access to the Google Cloud Storage bucket.
The external partner which I'm integrating with, saying that the user is set correctly, and to prove it they're showing that I can preform the action with gsutil.
Can you please assist? Any idea what might be the problem?
The answer was that the creds were indeed wrong, but it did worked when I tried to preform on the client client.bucket(bucket_name) instead of client.get_bucket(bucket_name).
Please follow these steps in order to correctly set up the Cloud Storage Client Library for Python. In general, the Cloud Storage Libraries can use Application default credentials or environment variables for authentication.
Notice that the recommended method to use would be to set up authentication using environment variables (i.e if you are using Linux: export GOOGLE_APPLICATION_CREDENTIALS="/path/to/[service-account-credentials].json" should work) and avoid the use of the service_account.Credentials.from_service_account_info() method altogether:
from google.cloud import storage
storage_client = storage.Client(project='project-id-where-the-bucket-is')
bucket_name = "your-bucket"
bucket = client.get_bucket(bucket_name)
should simply work because the authentication is handled by the client library via the environment variable.
Now, if you are interested in explicitly using the service account instead of using service_account.Credentials.from_service_account_info() method you can use the from_service_account_json() method directly in the following way:
from google.cloud import storage
# Explicitly use service account credentials by specifying the private key
# file.
storage_client = storage.Client.from_service_account_json(
'/[service-account-credentials].json')
bucket_name = "your-bucket"
bucket = client.get_bucket(bucket_name)
Find all the relevant details as to how to provide credentials to your application here.
tl;dr: dont use client.get_bucket at all.
See for detailed explanation and solution https://stackoverflow.com/a/51452170/705745
Please could someone help me with a query related to permissions on the Google cloud platform? I realise that this is only loosely programming related so I apologise if this is the wrong forum!
I have a project ("ProjectA") written in Python that uses Googles cloud storage and compute engine. The project has various buckets that are accessed using python code from both compute instances and from my home computer. This project uses a service account which is a Project "owner", I believe it has all APIs enabled and the project works really well. The service account name is "master#projectA.iam.gserviceaccount.com".
Recently I started a new project that needs similar resources (storage, compute) etc, but I want to keep it separate. The new project is called "ProjectB" and I set up a new master service account called master#projectB.iam.gserviceaccount.com. My code in ProjectB generates an error related to access permissions and is demonstrated even if I strip the code down to these few lines:
The code from ProjectA looked like this:
from google.cloud import storage
client = storage.Client(project='projectA')
mybucket = storage.bucket.Bucket(client=client, name='projectA-bucket-name')
currentblob = mybucket.get_blob('somefile.txt')
The code from ProjectB looks like this:
from google.cloud import storage
client = storage.Client(project='projectB')
mybucket = storage.bucket.Bucket(client=client, name='projectB-bucket-name')
currentblob = mybucket.get_blob('somefile.txt')
Both buckets definitely exist, and obviously if "somefile.text" does not exist then currentblob is None, which is fine, but when I execute this code I receive the following error:
Traceback (most recent call last):
File .... .py", line 6, in <module>
currentblob = mybucket.get_blob('somefile.txt')
File "C:\Python27\lib\site-packages\google\cloud\storage\bucket.py", line 599, in get_blob
_target_object=blob,
File "C:\Python27\lib\site-packages\google\cloud\_http.py", line 319, in api_request
raise exceptions.from_http_response(response)
google.api_core.exceptions.Forbidden: 403 GET https://www.googleapis.com/storage/v1/b/<ProjectB-bucket>/o/somefile.txt: master#ProjectA.iam.gserviceaccount.com does not have storage.objects.get access to projectB/somefile.txt.
Notice how the error message says "ProjectA" service account doesn't have ProjectB access - well, I would somewhat expect that but I was expecting to use the service account on ProjectB!
Upon reading the documentation and links such as this and this, but even after removing and reinstating the service account or giving it limited scopes it hasnt helped. I have tried a few things:
1) Make sure that my new service account was "activated" on my local machine (where the code is being run for now):
gcloud auth activate-service-account master#projectB.iam.gserviceaccount.com --key-file="C:\my-path-to-file\123456789.json"
This appears to be successful.
2) Verify the list of credentialled accounts:
gcloud auth list
This lists two accounts, one is my email address (that I use for gmail, etc), and the other is master#projectB.iam.gserviceaccount.com, so it appears that my account is "registered" properly.
3) Set the service account as the active account:
gcloud config set account master#projectB.iam.gserviceaccount.com
When I look at the auth list again, there is an asterisk "*" next to the service account, so presumably this is good.
4) Check that the project is set to ProjectB:
gcloud config set project projectB
This also appears to be ok.
Its strange that when I run the python code, it is "using" the service account from my old project even though I have changed seemingly everything to refer to project B - Ive activated the account, selected it, etc.
Please could someone point me in the direction of something that I might have missed? I don't recall going through this much pain when setting up my original project and Im finding it so incredibly frustrating that something I thought would be simple is proving so difficult.
Thank you to anyone who can offer me any assistance.
I'm not entirely sure, but this answer is from a similar question on here:
Permission to Google Cloud Storage via service account in Python
Specifying the account explicitly by pointing to the credentials in your code. As documented here:
Example from the documentation page:
def explicit():
from google.cloud import storage
# Explicitly use service account credentials by specifying the private key
# file.
storage_client = storage.Client.from_service_account_json(
'service_account.json')
# Make an authenticated API request
buckets = list(storage_client.list_buckets())
print(buckets)
Don't you have a configured GOOGLE_APPLICATION_CREDENTIALS env variable which points project A's SA?
The default behavior of Google SDK is to takes the service account from the environment variable GOOGLE_APPLICATION_CREDENTIALS.
If you want to change the account you can do something like:
from google.cloud import storage
credentials_json_file = os.environ.get('env_var_with_path_to_account_json')
client= storage.Client.from_service_account_json(credentials)
The above assumes you have creates a json account file like in: https://cloud.google.com/iam/docs/creating-managing-service-account-keys
and that the json account file is in the environment variable env_var_with_path_to_account_json
This way you can have 2 account files and decide which one to use.
I would like to be able to access data on a google sheet when running python code via cloud composer; this is something I know how to do in several ways when running code locally, but moving to the cloud is proving challenging. In particular I wish to authenticate as the composer service account rather than stashing the contents of a client_secret.json file somewhere (be that the source code or some cloud location).
For essentially the same question but instead accessing google cloud platform services, this has been relatively easy (even when running through composer) thanks to the google-cloud_* libraries. For instance, I have verified that I can push data to bigquery:
from google.cloud import bigquery
client = bigquery.Client()
client.project='test project'
dataset_id = 'test dataset'
table_id = 'test table'
dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
table = client.get_table(table_ref)
rows_to_insert = [{'some_column':'test string'}]
errors = client.insert_rows(table,rows_to_insert)
and the success or failure of this can be managed through sharing (or not) 'test dataset' with the composer service account.
Similarly, getting data from a cloud storage bucket works fine:
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket('test bucket')
name = 'test.txt'
data_blob = bucket.get_blob(name)
data_pre = data_blob.download_as_string()
and once again I have the ability to control access through IAM.
However, for working with google sheets it seems I must resort to the Google APIs python client, and here I run into difficulties. Most documentation on this (which seems to be a moving target!) assumes local code execution, starting with the creation and storage of a client_secret.json file example 1, example 2, which I understand locally but doesn't make sense for a shared cloud environment with source control. So, a couple of approaches I've tried instead:
Trying to build credentials using discovery and oauth2
from googleapiclient.discovery import build
from httplib2 import Http
from oauth2client.contrib import gce
SAMPLE_SPREADSHEET_ID = 'key for test sheet'
SAMPLE_RANGE_NAME = 'test range'
creds = gce.AppAssertionCredentials(scope='https://www.googleapis.com/auth/spreadsheets')
service = build('sheets', 'v4', http = creds.authorize(Http()))
sheet = service.spreadsheets()
result = sheet.values().get(spreadsheetId=SAMPLE_SPREADSHEET_ID,
range=SAMPLE_RANGE_NAME).execute()
values = result.get('values', [])
Caveat: I know nothing about working with scopes to create credential objects via Http. But this seems closest to working: I get an HTTP403 error of
'Request had insufficient authentication scopes.'
However, I don't know if that means I successfully presented myself as the service account, which was then deemed unsuitable for access (so I need to mess around with permissions some more); or didn't actually get that far (and need to fix this credentials creation process).
Getting a credential object with google.auth and passing to gspread
My (limited) understanding is that oauth2client is being deprecated and google.auth is now the way to go. This yields credentials objects in a similarly simple way to my successful examples above for cloud platform services, that I hoped I could just pass to gspread:
import gspread
from google.auth import compute_engine
credentials = compute_engine.Credentials()
client = gspread.authorize(credentials)
Sadly, gspread doesn't work with these objects, because they don't have the attributes it expects:
AttributeError: 'Credentials' object has no attribute 'access_token'
This is presumably because gspread expects oauth2 credentials and those chucked out by google.auth aren't sufficiently compatible. The gspread docs also go down the 'just get a client_secret file'... but presumably if I can get the previous (oauth/http-based) approach to work, I could then use gspread for data retrieval. For now, though, a hybrid of these two approaches stumbles in the same way: a permission denied response due to insufficient authentication scopes.
So, whether using google.auth, oauth2 (assuming that'll stick around for a while) or some other cloud-friendly approach (i.e. not one based on storing the secret key), how can I obtain suitable credentials in a cloud composer environment to make calls to the google sheets API? Bonus marks for a way that is compatible with gspread (and hence gspread_dataframe), but this is not essential. Also happy to hear that this is a PEBCAK error and I just need to configure IAM permissions differently for my current approach to work.
It looks like your Composer environment oauthScopes config wasn't set up properly. If left unspecified, the default cloud-platform doesn't allow you to access Google sheets API. You may want to create a new Composer environment with oauthScopes = [
"https://www.googleapis.com/auth/spreadsheets",
"https://www.googleapis.com/auth/cloud-platform"].
Google sheets API reference: https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/create.