I have a list of bucket names that I need to enable logging on programmatically. I am using boto3 and lambda. I can't seem to find the correct function in boto3/s3 to do what I need although I'm sure it's possible. Here is my code so far:
import json
import boto3
s3 = boto3.client('s3')
def lambda_handler(event, context):
# TODO implement
# print("hanlder:event")
# print(event)
# bucketDump()
setBucketPolicy(bucketDump())
def bucketDump():
##This program lists all exsisting buckets within an aws account (Tommy's Personal Account)
response = s3.list_buckets()
buckets = []
for bucket in response['Buckets']:
value = bucket["Name"]
buckets.append(value)
return buckets
##setting a bucket policy
def setBucketPolicy(buckets):
for bucket in buckets:
value = s3.get_bucket_logging(bucket)
print(value)
##TODO if bucket in buckets does not have loggin enabled, enable it!
# print(bucket)
My process is I want to iterate over the list of buckets I have and enable_logging for them! Thank you in advance.
As suggested by #jordanm in the comment below your question, using a resource instead of the client would make your life much easier as it provides a higher-level interface.
If the only goal of the bucketDump in your question was to retrieve all the buckets in your account, then you could totally remove it and use the standard function s3.buckets.all() that already returns an iterable of buckets (docs).
Assuming that you want to enable logging on all your buckets that don't have it already enabled and that you want to deliver logs from all the buckets to the same bucket, you could add a parameter to the `` function to specify this bucket. The implementation suggested below will enable logging and result in logs being organized like so:
- name_of_bucket_in_wich_to_store_logs
- bucket_name_1
- logs
- bucket_name_2
- logs
If you want to organize your logs differently you have to play with the TargetBucket and TargetPrefix parameters and, if needed, you can specify other parameters for grants as detailed in the docs.
import boto3
s3 = boto3.resource('s3')
def lambda_handler(event, context):
# TODO implement
setBucketPolicy(target_bucket='name_of_bucket_in_wich_to_store_logs')
def setBucketPolicy(target_bucket: str):
for bucket in s3.buckets.all():
bucket_logging = s3.BucketLogging(bucket.name)
if not bucket_logging.logging_enabled:
bucket_logging.put(
BucketLoggingStatus={
'LoggingEnabled': {
'TargetBucket': target_bucket,
'TargetPrefix': f'{bucket.name}/'
}
}
)
Related
I am uploading a file to a Cloud Storage bucket using the Python SDK:
from google.cloud import storage
bucket = storage.Client().get_bucket('mybucket')
df = # pandas df to save
csv = df.to_csv(index=False)
output = 'test.csv'
blob = bucket.blob(output)
blob.upload_from_string(csv)
How can I get the response to know if the file was uploaded successfully? I need to log the response to notify the user about the operation.
I tried with:
response = blob.upload_from_string(csv)
but it always return a None object even when the operation has succeded.
You can try with tqdm library.
import os
from google.cloud import storage
from tqdm import tqdm
def upload_function(client, bucket_name, source, dest, content_type=None):
bucket = client.bucket(bucket_name)
blob = bucket.blob(dest)
with open(source, "rb") as in_file:
total_bytes = os.fstat(in_file.fileno()).st_size
with tqdm.wrapattr(in_file, "read", total=total_bytes, miniters=1, desc="upload to %s" % bucket_name) as file_obj:
blob.upload_from_file(file_obj,content_type=content_type,size=total_bytes,
)
return blob
if __name__ == "__main__":
upload_function(storage.Client(), "bucket", "C:\files\", "Cloud:\blob.txt", "text/plain")
Regarding how to get notifications about changes made into the buckets there is a few ways that you could also try:
Using Pub/Sub - This is the recommended way where Pub/Sub notifications send information about changes to objects in your buckets to Pub/Sub, where the information is added to a Pub/Sub topic of your choice in the form of messages. Here you will find an example using python, as in your case, and using other ways as gsutil, other supported languages or REST APIs.
Object change notification with Watchbucket: This will create a notification channel that sends notification events to the given application URL for the given bucket using a gsutil command.
Cloud Functions with Google Cloud Storage Triggers using event-driven functions to handle events from Google Cloud Storage configuring these notifications to trigger in response to various events inside a bucket—object creation, deletion, archiving and metadata updates. Here there is some documentation on how to implement it.
Another way is using Eventarc to build an event-driven architectures, it offers a standardized solution to manage the flow of state changes, called events, between decoupled microservices. Eventarc routes these events to Cloud Run while managing delivery, security, authorization, observability, and error-handling for you. Here there is a guide on how to implement it.
Here you’ll be able to find related post with the same issue and answers:
Using Storage-triggered Cloud Function.
With Object Change Notification and Cloud Pub/Sub Notifications for Cloud Storage.
Answer with a Cloud Pub/Sub topic example.
You can verify if the upload gets any error, then use the exception's response methods:
def upload(blob,content):
try:
blob.upload_from_string(content)
except Exception as e:
status_code = e.response.status_code
status_desc = e.response.json()['error']['message']
else:
status_code = 200
status_desc = 'success'
finally:
return status_code,status_desc
Refs:
https://googleapis.dev/python/google-api-core/latest/_modules/google/api_core/exceptions.html
https://docs.python.org/3/tutorial/errors.html
I am posting this here because I found it really hard to find the function to get all objects from our s3 bucket using python. When I tried to find get_object_data function, I was directed to downloading the object function.
So, how do we get the data of all the objects in our AWS s3 bucket using boto3(aws sdk for python)?
import boto3 to your python shell
make a connection to your AWS account and specify the resource(s3-bucket here) you want to access?
(make sure that the IAM credentials you are giving have access to that resource)
get the data required
The code looks something like this
import boto3
s3_resource = boto3.resource(service_name='s3',
region_name='<your bucket region>'
aws_access_key_id='<your access key id>'
aws_secret_access_key='<your secret access key>')
a = s3_resource.Bucket('<your bucket name>')
for obj in a.objects.all():
#object URL
print("https://<your bucket name>.s3.<your bucket region>.amazonaws.com/" + obj.key)
#if you want to print all the data of object, just print obj
I'm trying to enable Transfer Acceleration for some AWS S3 buckets.
I start up my client session:
client = boto3.client(
"s3",
aws_access_key_id=environ.get("AWS_ACCESS_KEY_ID"),
aws_secret_access_key=environ.get("AWS_SECRET_ACCESS_KEY")
)
Then I turn Transfer Acceleration on through the S3 console, and have ensured it is enabled and turned on in the code as such:
response = client.put_bucket_accelerate_configuration(
Bucket='string',
AccelerateConfiguration={
'Status': 'Enabled'
}
)
and
response = client.get_bucket_accelerate_configuration(
Bucket='string'
)
both snippets come straight from boto3 docs. I am able to upload to the bucket successfully later on in the code with:
client.upload_fileobj(data, environ.get("AWS_S3_BUCKET"), 'key')
I tried setting the endpoint_url param while starting the client session, but this just created a new folder (with my bucket title) inside my bucket.
It seems that boto3 is the only SDK that doesn't have some sort of "use transfer acceleration endpoint" flag. I know it is enabled on the bucket, and I have proof of that, but I have no proof that it is actually using the endpoint.
I've tried going through client metadata, bucket metadata, and every other client method that returns any sort of data, and I can't find proof that it actually used the acceleration endpoint.
Am I missing something?
Connect to S3 accelerate endpoint with boto3 mentions using:
Config(s3={"use_accelerate_endpoint": True})
This parameter is listed in Config Reference — botocore documentation:
s3 (dict)
use_accelerate_endpoint -- Refers to whether to use the S3 Accelerate endpoint. The value must be a boolean. If True, the client will use the S3 Accelerate endpoint. If the S3 Accelerate endpoint is being used then the addressing style will always be virtual.
So try using:
s3_client = boto3.client("s3", config=Config(s3={"use_accelerate_endpoint": True}))
On boto I used to specify my credentials when connecting to S3 in such a way:
import boto
from boto.s3.connection import Key, S3Connection
S3 = S3Connection( settings.AWS_SERVER_PUBLIC_KEY, settings.AWS_SERVER_SECRET_KEY )
I could then use S3 to perform my operations (in my case deleting an object from a bucket).
With boto3 all the examples I found are such:
import boto3
S3 = boto3.resource( 's3' )
S3.Object( bucket_name, key_name ).delete()
I couldn't specify my credentials and thus all attempts fail with InvalidAccessKeyId error.
How can I specify credentials with boto3?
You can create a session:
import boto3
session = boto3.Session(
aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY,
aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY,
)
Then use that session to get an S3 resource:
s3 = session.resource('s3')
You can get a client with new session directly like below.
s3_client = boto3.client('s3',
aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY,
aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY,
region_name=REGION_NAME
)
This is older but placing this here for my reference too. boto3.resource is just implementing the default Session, you can pass through boto3.resource session details.
Help on function resource in module boto3:
resource(*args, **kwargs)
Create a resource service client by name using the default session.
See :py:meth:`boto3.session.Session.resource`.
https://github.com/boto/boto3/blob/86392b5ca26da57ce6a776365a52d3cab8487d60/boto3/session.py#L265
you can see that it just takes the same arguments as Boto3.Session
import boto3
S3 = boto3.resource('s3', region_name='us-west-2', aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY, aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY)
S3.Object( bucket_name, key_name ).delete()
I'd like expand on #JustAGuy's answer. The method I prefer is to use AWS CLI to create a config file. The reason is, with the config file, the CLI or the SDK will automatically look for credentials in the ~/.aws folder. And the good thing is that AWS CLI is written in python.
You can get cli from pypi if you don't have it already. Here are the steps to get cli set up from terminal
$> pip install awscli #can add user flag
$> aws configure
AWS Access Key ID [****************ABCD]:[enter your key here]
AWS Secret Access Key [****************xyz]:[enter your secret key here]
Default region name [us-west-2]:[enter your region here]
Default output format [None]:
After this you can access boto and any of the api without having to specify keys (unless you want to use a different credentials).
If you rely on your .aws/credentials to store id and key for a user, it will be picked up automatically.
For instance
session = boto3.Session(profile_name='dev')
s3 = session.resource('s3')
This will pick up the dev profile (user) if your credentials file contains the following:
[dev]
aws_access_key_id = AAABBBCCCDDDEEEFFFGG
aws_secret_access_key = FooFooFoo
region=op-southeast-2
There are numerous ways to store credentials while still using boto3.resource().
I'm using the AWS CLI method myself. It works perfectly.
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html?fbclid=IwAR2LlrS4O2gYH6xAF4QDVIH2Q2tzfF_VZ6loM3XfXsPAOR4qA-pX_qAILys
you can set default aws env variables for secret and access keys - that way you dont need to change default client creation code - though it is better to pass it as a parameter if you have non-default creds
I'm trying to get the cpu utilization for ec2 instances for an account. My code is like following.
def GetRegions():
return array of regions
def getEC2InstanceID(RegionName):
cloudwatch = boto3.client('cloudwatch', region_name=RegionName)
response = cloudwatch.get_metric_statistics(
.
.
.)
returns array of ec2instanceID
def EC2_Average_Utilization(InstanceID, RegionName):
returns avg cpuusage
def main():
regions= GetRegions()
for i in range(len(regions)):
print(regions[i])
instance_id = getEC2InstanceID(regions[i])
print(instance_id) # prints all the instances if there is any
if (type(instance_id)==list):
for j in range(len(instance_id)):
print(instance_id[j])
print ("For InstanceID "+ instance_id[j] + ":")
EC2_Average_Utilization(instance_id[j], regions[i])
This code executes perfectly for all the regions under only one account. If I want to do the same thing for multiple AWS accounts, what will be the procedure?
n.b I've seen configuring the .aws/config by creating multiple profiles under every account in .aws/credentials, but as I'm generating the regions in the code, I don't want to specify them.
You will need to use a boto3 Session object, the 'Security Token Service (STS)', and a call to assume_role for each account/region combo. The effect is the same as the named profile - you need a role in each account with adequate permissions to call the API methods (EC2, CloudWatch, etc). Also, the target roles need a trust relationship back to the original account credentials.
sts = boto3.client('sts')
#this is called with your default credentials. Target roles need to trust this identity
creds = sts.assume_role(RoleArn='...', RoleSessionName='...')
# set up a session w/ the temporary credentials
session = boto3.Session(
aws_access_key_id=creds['Credentials']['AccessKeyId'],
aws_secret_access_key=creds['Credentials']['SecretAccessKey'],
aws_session_token=creds['Credentials']['SessionToken']
region_name='...')
# all subsequent clients/resources should be instantiated from the session object
cloudwatch = session.client('cloudwatch')
Hope this helps.
See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sts.html#STS.Client.assume_role