How to use boto3 on EC2 instance without local configuration?

How to use boto3 on EC2 instance without local configuration? - python

My goal is to be able to run Python program using boto3 to access DynamoDB without any local configuration. I've been following this AWS document https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html) and it seems to be feasible using the 'IAM role' option https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html#iam-role. This means I don't have anything configured locally.
However, as I attached a role with DynamoDB access permission to the EC2 instance the Python program is running and ran boto3.resources('dynamodb') I kept getting the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/.local/lib/python3.6/site-packages/boto3/__init__.py", line 100, in resource
return _get_default_session().resource(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/boto3/session.py", line 389, in resource
aws_session_token=aws_session_token, config=config)
File "/home/ubuntu/.local/lib/python3.6/site-packages/boto3/session.py", line 263, in client
aws_session_token=aws_session_token, config=config)
File "/home/ubuntu/.local/lib/python3.6/site-packages/botocore/session.py", line 839, in create_client
client_config=config, api_version=api_version)
File "/home/ubuntu/.local/lib/python3.6/site-packages/botocore/client.py", line 86, in create_client
verify, credentials, scoped_config, client_config, endpoint_bridge)
File "/home/ubuntu/.local/lib/python3.6/site-packages/botocore/client.py", line 328, in _get_client_args
verify, credentials, scoped_config, client_config, endpoint_bridge)
File "/home/ubuntu/.local/lib/python3.6/site-packages/botocore/args.py", line 47, in get_client_args
endpoint_url, is_secure, scoped_config)
File "/home/ubuntu/.local/lib/python3.6/site-packages/botocore/args.py", line 117, in compute_client_args
service_name, region_name, endpoint_url, is_secure)
File "/home/ubuntu/.local/lib/python3.6/site-packages/botocore/client.py", line 402, in resolve
service_name, region_name)
File "/home/ubuntu/.local/lib/python3.6/site-packages/botocore/regions.py", line 122, in construct_endpoint
partition, service_name, region_name)
File "/home/ubuntu/.local/lib/python3.6/site-packages/botocore/regions.py", line 135, in _endpoint_for_partition
raise NoRegionError()
botocore.exceptions.NoRegionError: You must specify a region.
I've searched the internet and it seems most of the solutions pointing to have local configuration (e.g. ~/.aws/config, boto3 config file, etc.).
Also, I have verified that from EC2 instance, I am able to get the region from instance metadata:
$ curl --silent http://169.254.169.254/latest/dynamic/instance-identity/document
{
...
"region" : "us-east-2",
...
}
My workaround right now is to provide an environment variable AWS_DEFAULT_REGION passing via Docker command line.
Here is the simple code I have to replicate the issue:
>>>import boto3
>>>dynamodb = boto3.resource('dynamodb')
I expect somehow boto3 is able to pick up the region that is already available in the EC2 instance.

There are two types of configuration data in boto3: credentials and non-credentials (including region). How boto3 reads them differs.
See:
Configuring Credentials
https://github.com/boto/boto3/issues/375
Specifically, boto3 retrieves credentials from the instance metadata service but not other configuration items (such as region).
So, you need to indicate which region you want. You can retrieve the current region from metadata and use it, if appropriate. Or use the environment variable AWS_DEFAULT_REGION.

You can pass region as a parameter to any boto3 resource.
dynamodb = boto3.resource('dynamodb', region_name='us-east-2')

Related

GKE pod not able to access private GCS Bucket (401 unauthorized error)

I am developing an application in Python which will be deployed on Google Kubernetes Engine (GKE) pod. The application involves writing and reading .csv files to Google Cloud Storage (private Google bucket). I am facing an error while trying to read/write files to the Google bucket. The read/write to private google bucket is working when I run the application on my local system.
The operation is failing when the application is deployed to the GKE pod.
The GKE pod in the cluster is not able to access the private GCS bucket even though I am providing credentials the same as local system. Following are some of the details regarding the application and the error which I am facing:
DockerFile: The docker file contain a reference to the cred.json file which contains credentials of the google cloud service account.
The service account has permissions of google cloud storage admin.
FROM python:3.9.10-slim-buster
WORKDIR /pipeline
COPY . .
RUN pip3 install --no-cache-dir -r requirements.txt
EXPOSE 3000
ENV GOOGLE_APPLICATION_CREDENTIALS=/pipeline/cred.json
ENV GIT_PYTHON_REFRESH=quiet
requirements.txt: Following is the requirements.txt file content (I have included only google cloud related packages as they are relevant related to the error):
google-api-core==2.8.2
google-auth==2.9.0
google-auth-oauthlib==0.5.2
google-cloud-bigquery==3.2.0
google-cloud-bigquery-storage==2.11.0
google-cloud-core==2.3.1
google-cloud-storage==2.4.0
google-crc32c==1.3.0
google-resumable-media==2.3.3
googleapis-common-protos==1.56.3
fsspec==2022.8.2
gcsfs==2022.8.2
gevent==21.12.0
Error details: Following is the traceback:
Anonymous caller does not have storage.objects.create access to the Google Cloud Storage bucket., 401
ERROR:gcsfs:_request non-retriable exception: Anonymous caller does not have storage.objects.create access to the Google Cloud Storage bucket., 401
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/gcsfs/retry.py", line 115, in retry_request
return await func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/gcsfs/core.py", line 384, in _request
validate_response(status, contents, path, args)
File "/usr/local/lib/python3.9/site-packages/gcsfs/retry.py", line 102, in validate_response
raise HttpError(error)
gcsfs.retry.HttpError: Anonymous caller does not have storage.objects.create access to the Google Cloud Storage bucket., 401
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/threading.py", line 973, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.9/threading.py", line 910, in run
self._target(*self._args, **self._kwargs)
File "/pipeline/training/train.py", line 133, in training
X.to_csv(file_name, index=False)
File "/usr/local/lib/python3.9/site-packages/pandas/core/generic.py", line 3563, in to_csv
return DataFrameRenderer(formatter).to_csv(
File "/usr/local/lib/python3.9/site-packages/pandas/io/formats/format.py", line 1180, in to_csv
csv_formatter.save()
File "/usr/local/lib/python3.9/site-packages/pandas/io/formats/csvs.py", line 261, in save
self._save()
File "/usr/local/lib/python3.9/site-packages/pandas/io/formats/csvs.py", line 266, in _save
self._save_body()
File "/usr/local/lib/python3.9/site-packages/pandas/io/formats/csvs.py", line 304, in _save_body
self._save_chunk(start_i, end_i)
File "/usr/local/lib/python3.9/site-packages/pandas/io/formats/csvs.py", line 315, in _save_chunk
libwriters.write_csv_rows(
File "pandas/_libs/writers.pyx", line 72, in pandas._libs.writers.write_csv_rows
File "/usr/local/lib/python3.9/site-packages/fsspec/spec.py", line 1491, in write
self.flush()
File "/usr/local/lib/python3.9/site-packages/fsspec/spec.py", line 1527, in flush
self._initiate_upload()
File "/usr/local/lib/python3.9/site-packages/gcsfs/core.py", line 1443, in _initiate_upload
self.location = sync(
File "/usr/local/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
raise return_result
File "/usr/local/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
result[0] = await coro
File "/usr/local/lib/python3.9/site-packages/gcsfs/core.py", line 1559, in initiate_upload
headers, _ = await fs._call(
File "/usr/local/lib/python3.9/site-packages/gcsfs/core.py", line 392, in _call
status, headers, info, contents = await self._request(
File "/usr/local/lib/python3.9/site-packages/decorator.py", line 221, in fun
return await caller(func, *(extras + args), **kw)
File "/usr/local/lib/python3.9/site-packages/gcsfs/retry.py", line 152, in retry_request
raise e
File "/usr/local/lib/python3.9/site-packages/gcsfs/retry.py", line 115, in retry_request
return await func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/gcsfs/core.py", line 384, in _request
validate_response(status, contents, path, args)
File "/usr/local/lib/python3.9/site-packages/gcsfs/retry.py", line 102, in validate_response
raise HttpError(error)
gcsfs.retry.HttpError: Anonymous caller does not have storage.objects.create access to the Google Cloud Storage bucket., 401
I also tried running the application by making the google cloud bucket public. With this approach, the read and write operations to google cloud bucket are working.
The problem arises when the google cloud bucket is private (which is essential for application deployment).
Any help to resolve this error will be appreciated. Thanks in advance!!

You are able to read/write from local system because you might you be using your credential or impersonating the SA that has permission to access the private bucket. FYI - if you are access bucket cross-project then the SA should be granted required permission in the the project bucket is in.
One thing you can do is grant the SA that you are using to run the gke pod required permission (instead of explicitly setting the credentials GOOGLE_APPLICATION_CREDENTIALS) and can access the credentials with google.auth.default() wherever needed.
PS: If the SA running your gke pod has storage access permission in project the bucket you are trying to access is in, then you should be just fine.
Hope this helps :)

403 Request had insufficient authentication issues while accessing Secrets on GCP within a container

I am trying to access a secret on GCP Secrets and I get the following error :
in get_total_results "api_key": get_credentials("somekey").get("somekey within key"), File
"/helper.py", line 153, in get_credentials response = client.access_secret_version(request={"name": resource_name})
File "/usr/local/lib/python3.8/site-packages/google/cloud/secretmanager_v1/services/secret_manager_service/client.py",
line 1136, in access_secret_version response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
File "/usr/local/lib/python3.8/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__
return wrapped_func(*args, **kwargs) File "/usr/local/lib/python3.8/site-packages/google/api_core/retry.py", line 285, in retry_wrapped_func return retry_target( File "/usr/local/lib/python3.8/site-packages/google/api_core/retry.py",
line 188, in retry_target return target() File "/usr/local/lib/python3.8/site-packages/google/api_core/grpc_helpers.py",
line 69, in error_remapped_callable six.raise_from(exceptions.from_grpc_error(exc), exc) File "<string>",
line 3, in raise_from google.api_core.exceptions.PermissionDenied:
403 Request had insufficient authentication scopes.
The code is fairly simple:-
def get_credentials(secret_id):
project_id = os.environ.get("PROJECT_ID")
resource_name = f"projects/{project_id}/secrets/{secret_id}/versions/1"
client = secretmanager.SecretManagerServiceClient()
response = client.access_secret_version(request={"name": resource_name})
secret_string = response.payload.data.decode("UTF-8")
secret_dict = json.loads(secret_string)
return secret_dict
So, what I have is a cloud function, which is deployed using Triggers, and uses a service account which has the Owner role.
The cloud function triggers a Kubernete Work Job and creates a container, which downloads a repo inside the container and executes it.
Dockerfile is:
FROM gcr.io/project/repo:latest
FROM python:3.8-slim-buster
COPY . /some_dir
WORKDIR /some_dir
COPY --from=0 ./repo /a_repo
RUN pip install -r requirements.txt & pip install -r a_repo/requirements.txt
ENTRYPOINT ["python3" , "main.py"]

The GCE instance might not have the correct authentication scope.
From: https://developers.google.com/identity/protocols/oauth2/scopes#secretmanager
https://www.googleapis.com/auth/cloud-platform is the required scope.
When creating the GCE instance you need to select the option that gives the instance the correct scope to call out to cloud APIs:

Google Cloud DataFlow job throws alert after few hours

Running a DataFlow streaming job using 2.11.0 release.
I get the following authentication error after few hours:
File "streaming_twitter.py", line 188, in <lambda>
File "streaming_twitter.py", line 102, in estimate
File "streaming_twitter.py", line 84, in estimate_aiplatform
File "streaming_twitter.py", line 42, in get_service
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper return wrapped(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery.py", line 227, in build credentials=credentials)
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper return wrapped(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery.py", line 363, in build_from_document credentials = _auth.default_credentials()
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/_auth.py", line 42, in default_credentials credentials, _ = google.auth.default()
File "/usr/local/lib/python2.7/dist-packages/google/auth/_default.py", line 306, in default raise exceptions.DefaultCredentialsError(_HELP_MESSAGE) DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application.
This Dataflow job performs an API request to AI Platform prediction
and seems to be Authentication token is expiring.
Code snippet:
def get_service():
# If it hasn't been instantiated yet: do it now
return discovery.build('ml', 'v1',
discoveryServiceUrl=DISCOVERY_SERVICE,
cache_discovery=True)
I tried adding the following lines to the service function:
os.environ[
"GOOGLE_APPLICATION_CREDENTIALS"] = "/tmp/key.json"
But I get:
DefaultCredentialsError: File "/tmp/key.json" was not found. [while running 'generatedPtransform-930']
I assume because file is not in DataFlow machine.
Other option is to use developerKey param in build method, but doesnt seems supported by AI Platform prediction, I get error:
Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project."> [while running 'generatedPtransform-22624']
Looking to understand how to fix it and what is the best practice?
Any suggestions?
Complete logs here
Complete code here

Setting os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/tmp/key.json' only works locally with the DirectRunner. Once deploying to a distributed runner like Dataflow, each worker won't be able to find the local file /tmp/key.json.
If you want each worker to use a specific service account, you can tell Beam which service account to use to identify workers.
First, grant the roles/dataflow.worker role to the service account you want your workers to use. There is no need to download the service account key file :)
Then if you're letting PipelineOptions parse your command line arguments, you can simply use the service_account_email option, and specify it like --service_account_email your-email#your-project.iam.gserviceaccount.com when running your pipeline.
The service account pointed by your GOOGLE_APPLICATION_CREDENTIALS is simply used to start the job, but each worker uses the service account specified by the service_account_email. If a service_account_email is not passed, it defaults to the email from your GOOGLE_APPLICATION_CREDENTIALS file.

Accessing google cloud storage bucket from cloud functions throws 500 error

I'm trying to access google cloud storage bucket from cloud functions (python) instance and it's throwing mystic 500 error.
I have given the service account editor role too. It didn't make any change.
I also checked if any of the quota is going off limit. The limits were not even close.
Please, anyone can help me find cause of this error?
here is the code
from google.cloud import storage
import os
import base64
storage_client = storage.Client()
def init_analysis(event, context):
print("event", event)
pubsub_message = base64.b64decode(event['data']).decode('utf-8')
print(pubsub_message)
bucket_name = 'my-bucket'
bucket = storage_client.get_bucket(bucket_name)
blobs = bucket.list_blobs()
for blob in blobs:
print(blob.name)
Error:
Traceback (most recent call last): File "/env/local/lib/python3.7/site-packages/google/auth/compute_engine/credentials.py", line 99, in refresh service_account=self._service_account_email) File "/env/local/lib/python3.7/site-packages/google/auth/compute_engine/_metadata.py", line 208, in get_service_account_token 'instance/service-accounts/{0}/token'.format(service_account)) File "/env/local/lib/python3.7/site-packages/google/auth/compute_engine/_metadata.py", line 140, in get url, response.status, response.data), response) google.auth.exceptions.TransportError: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/my-project#appspot.gserviceaccount.com/token from the Google Compute Enginemetadata service. Status: 500 Response:\nb'Could not fetch URI /computeMetadata/v1/instance/service-accounts/my-project#appspot.gserviceaccount.com/token\\n'", <google.auth.transport.requests._Response object at 0x2b0ef9edf438>) The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 383, in run_background_function _function_handler.invoke_user_function(event_object) File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 217, in invoke_user_function return call_user_function(request_or_event) File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 214, in call_user_function event_context.Context(**request_or_event.context)) File "/user_code/main.py", line 21, in init_analysis bucket = storage_client.get_bucket(bucket_name) File "/env/local/lib/python3.7/site-packages/google/cloud/storage/client.py", line 227, in get_bucket bucket.reload(client=self) File "/env/local/lib/python3.7/site-packages/google/cloud/storage/_helpers.py", line 130, in reload _target_object=self, File "/env/local/lib/python3.7/site-packages/google/cloud/_http.py", line 315, in api_request target_object=_target_object, File "/env/local/lib/python3.7/site-packages/google/cloud/_http.py", line 192, in _make_request return self._do_request(method, url, headers, data, target_object) File "/env/local/lib/python3.7/site-packages/google/cloud/_http.py", line 221, in _do_request return self.http.request(url=url, method=method, headers=headers, data=data) File "/env/local/lib/python3.7/site-packages/google/auth/transport/requests.py", line 205, in request self._auth_request, method, url, request_headers) File "/env/local/lib/python3.7/site-packages/google/auth/credentials.py", line 122, in before_request self.refresh(request) File "/env/local/lib/python3.7/site-packages/google/auth/compute_engine/credentials.py", line 102, in refresh six.raise_from(new_exc, caught_exc) File "<string>", line 3, in raise_from google.auth.exceptions.RefreshError: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/my-project#appspot.gserviceaccount.com/token from the Google Compute Enginemetadata service. Status: 500 Response:\nb'Could not fetch URI /computeMetadata/v1/instance/service-accounts/my-project#appspot.gserviceaccount.com/token\\n'", <google.auth.transport.requests._Response object at 0x2b0ef9edf438>)
google.auth.exceptions.TransportError: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/my-project#appspot.gserviceaccount.com/token from the Google Compute Enginemetadata service. Status: 500 Response:\nb'Could not fetch URI /computeMetadata/v1/instance/service-accounts/my-project#appspot.gserviceaccount.com/token\\n'"

The error you are getting it's because your Cloud Functions service account doesn't have the cloudfunctions.serviceAgent role. As you can see on the documentation:
Authenticating as the runtime service account from inside your function may fail if you change the Cloud Functions service account's permissions.
However, I found that sometimes you can not add this role as it doesn't show up as an option. I have reported this issue to the Google Cloud Functions engineering team and they are working to solve it.
Nevertheless, you can add the role again using this gcloud command:
gcloud projects add-iam-policy-binding <project_name> --role=roles/cloudfunctions.serviceAgent --member=serviceAccount:service-<project_number>#gcf-admin-robot.iam.gserviceaccount.com

can not authenticate with gcs in python

I am following the example in https://developers.google.com/storage/docs/gspythonlibrary#credentials
I created client/secret pair by choosing in the dev. console "create new client id", "installed application", "other".
I have the following code in my python script:
import boto
from gcs_oauth2_boto_plugin.oauth2_helper import SetFallbackClientIdAndSecret
CLIENT_ID = 'my_client_id'
CLIENT_SECRET = 'xxxfoo'
SetFallbackClientIdAndSecret(CLIENT_ID, CLIENT_SECRET)
uri = boto.storage_uri('foobartest2014', 'gs')
header_values = {"x-goog-project-id": proj_id}
uri.create_bucket(headers=header_values)
and it fails with the following error:
File "/usr/local/lib/python2.7/dist-packages/boto/storage_uri.py", line 555, in create_bucket
conn = self.connect()
File "/usr/local/lib/python2.7/dist-packages/boto/storage_uri.py", line 140, in connect
**connection_args)
File "/usr/local/lib/python2.7/dist-packages/boto/gs/connection.py", line 47, in __init__
suppress_consec_slashes=suppress_consec_slashes)
File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 190, in __init__
validate_certs=validate_certs, profile_name=profile_name)
File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 572, in __init__
host, config, self.provider, self._required_auth_capability())
File "/usr/local/lib/python2.7/dist-packages/boto/auth.py", line 883, in get_auth_handler
'Check your credentials' % (len(names), str(names)))
boto.exception.NoAuthHandlerFound: No handler was ready to authenticate. 3 handlers were checked. ['OAuth2Auth', 'OAuth2ServiceAccountAuth', 'HmacAuthV1Handler'] Check your credentials

I have been struggling with this for the last couple of days, turns out the boto stuff, and that gspythonlibrary are all totally obsolete.
The latest example code showing how to use/authenticate Google Cloud Storage is here:
https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/storage/api

You need to provide a client/secret pair in a .boto file, and then run gsutil config.
It will create a refresh token, and then should work!
For more info, see https://developers.google.com/storage/docs/gspythonlibrary#credentials

U can also make console application for gsutil commands authentication and gsutil cp, rm, gsutil config -a pass through console application to cloud SDK then execute

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to use boto3 on EC2 instance without local configuration? - python

You can pass region as a parameter to any boto3 resource. dynamodb = boto3.resource('dynamodb', region_name='us-east-2')

Related

GKE pod not able to access private GCS Bucket (401 unauthorized error)

403 Request had insufficient authentication issues while accessing Secrets on GCP within a container

Google Cloud DataFlow job throws alert after few hours

Accessing google cloud storage bucket from cloud functions throws 500 error

can not authenticate with gcs in python

Categories

Resources