Trying to connect to Google cloud storage (GCS) using python - python

I've build the following script:
import boto
import sys
import gcs_oauth2_boto_plugin
def check_size_lzo(ds):
# URI scheme for Cloud Storage.
CLIENT_ID = 'myclientid'
CLIENT_SECRET = 'mysecret'
GOOGLE_STORAGE = 'gs'
dir_file= 'date_id={ds}/apollo_export_{ds}.lzo'.format(ds=ds)
gcs_oauth2_boto_plugin.SetFallbackClientIdAndSecret(CLIENT_ID, CLIENT_SECRET)
uri = boto.storage_uri('my_bucket/data/apollo/prod/'+ dir_file, GOOGLE_STORAGE)
key = uri.get_key()
if key.size < 45379959:
raise ValueError('umg lzo file is too small, investigate')
else:
print('umg lzo file is %sMB' % round((key.size/1e6),2))
if __name__ == "__main__":
check_size_lzo(sys.argv[1])
It works fine locally but when I try and run on kubernetes cluster I get the following error:
boto.exception.GSResponseError: GSResponseError: 403 Access denied to 'gs://my_bucket/data/apollo/prod/date_id=20180628/apollo_export_20180628.lzo'
I have updated the .boto file on my cluster and added my oauth client id and secret but still having the same issue.
Would really appreciate help resolving this issue.
Many thanks!

If it works in one environment and fails in another, I assume that you're getting your auth from a .boto file (or possibly from the OAUTH2_CLIENT_ID environment variable), but your kubernetes instance is lacking such a file. That you got a 403 instead of a 401 says that your remote server is correctly authenticating as somebody, but that somebody is not authorized to access the object, so presumably you're making the call as a different user.
Unless you've changed something, I'm guessing that you're getting the default Kubernetes Engine auth, with means a service account associated with your project. That service account probably hasn't been granted read permission for your object, which is why you're getting a 403. Grant it read/write permission for your GCS resources, and that should solve the problem.
Also note that by default the default credentials aren't scoped to include GCS, so you'll need to add that as well and then restart the instance.

Related

Flask web app on Cloud Run - google.auth.exceptions.DefaultCredentialsError:

I'm hosting a Flask web app on Cloud Run. I'm also using Secret Manager to store Service Account keys. (I previously downloaded a JSON file with the keys)
In my code, I'm accessing the payload then using os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = payload to authenticate. When I deploy the app and try to visit the page, I get an Internal Service Error. Reviewing the logs, I see:
File "/usr/local/lib/python3.10/site-packages/google/auth/_default.py", line 121, in load_credentials_from_file
raise exceptions.DefaultCredentialsError(
google.auth.exceptions.DefaultCredentialsError: File {"
I can access the secret through gcloud just fine with: gcloud secrets versions access 1 --secret="<secret_id>" while acting as the Service Account.
Here is my Python code:
# Grabbing keys from Secret Manager
def access_secret_version():
# Create the Secret Manager client.
client = secretmanager.SecretManagerServiceClient()
# Build the resource name of the secret version.
name = "projects/{project_id}/secrets/{secret_id}/versions/1"
# Access the secret version.
response = client.access_secret_version(request={"name": name})
payload = response.payload.data.decode("UTF-8")
return payload
#app.route('/page/page_two')
def some_random_func():
# New way
payload = access_secret_version() # <---- calling the payload
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = payload
# Old way
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "service-account-keys.json"
I'm not technically accessing a JSON file like I was before. The payload variable is storing entire key. Is this why it's not working?
Your approach is incorrect.
When you run on a Google compute service like Cloud Run, the code runs under the identity of the compute service.
In this case, by default, Cloud Run uses the Compute Engine default service account but, it's good practice to create a Service Account for your service and specify it when you deploy it to Cloud Run (see Service accounts).
This mechanism is one of the "legs" of Application Default Credentials when your code is running on Google Cloud, you don't specify the environment variable (you also don't need to create a key) and Cloud Run service acquires the credentials from the Metadata service:
import google.auth
credentials, project_id = google.auth.default()
See google.auth package
It is bad practice to define|set an environment variable within code. By their nature, environment variables should be provided by the environment. Doing this with APPLICATION_DEFAULT_CREDENTIALS means that your code always sets this value when it should only do this when the code is running off Google Cloud.
For completeness, if you need to create Credentials from a JSON string rather than from a file contain a JSON string, you can use from_service_account_info (see google.oauth2.service_account)

How does client= translate.TranslationServiceClient() work in conjunction with os.environ['GOOGLE_APPLICATION_CREDENTIALS']

I am using python and azure function app to send a document to be translated using the google cloud translation api.
I am trying to load the credentials from a tempfile (json) using the below code. The idea is to later download the json file from blob storage and store it in a temp file but I am not thinking about the blob storage for now.
key= {cred info}
f= tempfile.NamedTemporaryFile(suffix='.json', mode='a+')
json.dump(key, f)
f.flush()
f.seek(0)
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = f.name
client= translate.TranslationServiceClient()
But when I run this I get the following error:
Exception: PermissionError: [Errno 13] Permission denied:
How can I correctly load the creds from a temp file?. Also what is the relationship between translate.TranslationServiceClient() and os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = f.name? Does the TranslationServiceClient() get the creds from the environment variable?
I have been looking at this problem for a while now and I cannot find a good solution. Any help would be amazing!
edit:
when I change it to
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = f.read()
I get a different error:
System.Private.CoreLib: Exception while executing function:
Functions.Trigger. System.Private.CoreLib: Result: Failure
Exception: DefaultCredentialsError:
EDIT 2:
Its really weird, but it works when I read the file just before like so:
contents= f.read()
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = f.name
client= translate.TranslationServiceClient()
Any ideas why?
Any application which connects to any GCP Product requires credentials to authenticate. Now there are many ways how this authentication works.
According to the Google doc
Additionally, we recommend you use Google Cloud Client Libraries for your application. Google Cloud Client Libraries use a library called Application Default Credentials (ADC) to automatically find your service account credentials. ADC looks for service account credentials in the following order:
If the environment variable GOOGLE_APPLICATION_CREDENTIALS is set, ADC uses the service account key or configuration file that the variable points to.
If the environment variable GOOGLE_APPLICATION_CREDENTIALS isn't set, ADC uses the service account that is attached to the resource that is running your code.
This service account might be a default service account provided by Compute Engine, Google Kubernetes Engine, App Engine, Cloud Run, or Cloud Functions. It might also be a user-managed service account that you created.
If ADC can't use any of the above credentials, an error occurs.
There are also modules provided by Google that can be used to pass the credentials.
If you already have the JSON value as dictionary then you can simply pass dictionary in from_service_account_info(key)
Example:
key = json.load(open("JSON File Path")) # loading my JSON file into dictionary
client = translate.TranslationServiceClient().from_service_account_info(key)
In your case you already have the key as dictionary
As for the error you are getting, I believe that has to be something with the temp file. Because GOOGLE_APPLICATION_CREDENTIALS needs full access to the JSON file path to read from it.

Uploading file with python returns Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>

blob.upload_from_filename(source) gives the error
raise exceptions.from_http_status(response.status_code, message, >response=response)
google.api_core.exceptions.Forbidden: 403 POST >https://www.googleapis.com/upload/storage/v1/b/bucket1-newsdata->bluetechsoft/o?uploadType=multipart: ('Request failed with status >code', 403, 'Expected one of', )
I am following the example of google cloud written in python here!
from google.cloud import storage
def upload_blob(bucket, source, des):
client = storage.Client.from_service_account_json('/path')
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket)
blob = bucket.blob(des)
blob.upload_from_filename(source)
I used gsutil to upload files, which is working fine.
Tried to list the bucket names using the python script which is also working fine.
I have necessary permissions and GOOGLE_APPLICATION_CREDENTIALS set.
This whole things wasn't working because I didn't have permission storage admin in the service account that I am using in GCP.
Allowing storage admin to my service account solved my problem.
As other answers have indicated that this is related to the issue of permission, I have found one following command as useful way to create default application credential for currently logged in user.
Assuming, you got this error, while running this code in some machine. Just following steps would be sufficient:
SSH to vm where code is running or will be running. Make sure you are user, who has permission to upload things in google storage.
Run following command:
gcloud auth application-default login
This above command will ask to create token by clicking on url. Generate token and paste in ssh console.
That's it. All your python application started as that user, will use this as default credential for storage buckets interaction.
Happy GCP'ing :)
This question is more appropriate for a support case.
As you are getting a 403, most likely you are missing a permission on IAM, the Google Cloud Platform support team will be able to inspect your resources and configurations.
This is what worked for me when the google documentation didn't work. I was getting the same error with the appropriate permissions.
import pathlib
import google.cloud.storage as gcs
client = gcs.Client()
#set target file to write to
target = pathlib.Path("local_file.txt")
#set file to download
FULL_FILE_PATH = "gs://bucket_name/folder_name/file_name.txt"
#open filestream with write permissions
with target.open(mode="wb") as downloaded_file:
#download and write file locally
client.download_blob_to_file(FULL_FILE_PATH, downloaded_file)

Stackdriver Google Python API Access Denied

When trying to create a sink using the Google Cloud Python3 API Client I get the error:
RetryError: GaxError(Exception occurred in retry method that was not classified as transient, caused by <_Rendezvous of RPC that terminated with (StatusCode.PERMISSION_DENIED, The caller does not have permission)>)
The code I used was this one:
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path_to_json_secrets.json'
from google.cloud.bigquery.client import Client as bqClient
bqclient = bqClient()
ds = bqclient.dataset('dataset_name')
print(ds.access_grants)
[]
ds.delete()
ds.create()
print(ds.access_grants)
[<AccessGrant: role=WRITER, specialGroup=projectWriters>,
<AccessGrant: role=OWNER, specialGroup=projectOwners>,
<AccessGrant: role=OWNER, userByEmail=id_1#id_2.iam.gserviceaccount.com>,
<AccessGrant: role=READER, specialGroup=projectReaders>]
from google.cloud.logging.client import Client as lClient
lclient = lClient()
dest = 'bigquery.googleapis.com%s' %(ds.path)
sink = lclient.sink('sink_test', filter_='jsonPayload.project=project_name', destination=dest)
sink.create()
Don't quite understand why this is happening. When I use lclient.log_struct() I can see the logs arriving in the Logging console so I do have access to Stackdriver Logging.
Is there any mistake in this setup?
Thanks in advance.
Creating a sink requires different permissions than writing a log entry. By default service accounts are given project Editor (not Owner), which does not have permission to create sinks.
See the list of permissions required in the access control docs.
Make sure the service account you're using has logging.sinks.create permission. The simplest way to do this is to switch the service account from Editor to Owner, but it would be better to add the Logs Editor Role so you just give it the permission it needs.

Unauthorized error in GAE SDK, but it works once deployed

I am running this code in a small example:
from google.cloud import storage
from google.appengine.api import app_identity
class TestB(base_handler.TRNHandler):
#...
def post(self):
client = storage.Client()
bucket_name = os.environ.get('BUCKET_NAME',
app_identity.get_default_gcs_bucket_name())
bucket = client.get_bucket(bucket_name)
#...
If I deploy this code everything works as expected. But when I run it locally (SDK), I get an error: Unauthorized: 401 Invalid Credentials. What's happening and how can I fix it?
I've got a pretty strong guess, although I can't be sure without seeing your exact logs and whatnot.
The google.cloud library is smart about authorization. It uses a thing called "application default credentials." If you run that code on App Engine or on a GCE instance, the code will be able to figure out which service account is associated with that instance and authorize itself with the credentials of that account.
However, when you run the program locally, the library has no way of knowing which credentials to use, and so it just makes calls anonymously. Your bucket probably hasn't granted anonymous users access (which is good), and so the call fails with a 401.
You can, however, register credentials locally with the gcloud command:
$> gcloud auth application-default login
Run that, and the library will use whatever credentials you've used to log in for a while. Alternatively, you could also make sure that the environment variable GOOGLE_APPLICATION_CREDENTIALS points to a service account's JSON key file.
There's a bunch of documentation on exactly how Application Default Credentials pick a credential.
Alternately, if you'd prefer to specify auth right in the program, you can do that too:
storage = Storage.from_service_account_json('/path/to/key_file.json')

Categories