The effective way to handle missing file on s3 - python

clients.py
"""
Wraps boto3 to provide client and resource objects that connect to Localstack when
running tests if LOCALSTACK_URL env variable is set, otherwise connect to real AWS.
"""
import os
import boto3
# Port 4566 is the universal "edge" port for all services since localstack v0.11.0,
# so LOCALSTACK_URL can be of the form "http://hostname:4566".
# "None" means boto3 will use the default endpoint for the real AWS services.
ENDPOINT_URL = os.getenv("LOCALSTACK_URL", None)
def s3_resource():
return boto3.resource("s3", endpoint_url=ENDPOINT_URL)
def s3_client():
return boto3.client("s3", endpoint_url=ENDPOINT_URL)
mycode.py
from mymod.aws.clients import s3_resource
s3 = s3_resource()
def __get_object_etag(self, s3_dir_to_download_file_from: str, file_name: str) -> str
bucket, key = s3.deconstruct_s3_path(
f"{s3_dir_to_download_file_from}/{file_name}"
)
try:
etag_value = s3_resource().Object(bucket, key).e_tag
return etag_value
except botocore.exceptions.ClientError:
raise
I am wondering if in mycode.py I am doing the correct error handling when the file i am looking for does not exist on s3. I basically expect key to be there and if not i want to raise and error i do not want to proceed further as this code will be used part of a pipeline that relies on each previous step.
I would love to know if i am handling the error correct or if i am wrong then how would one handle this case?
From what I understand "exceptions" are suppose to handle errors and proceed further but in my case I do not want to proceed further in the rest of my code if the file i am looking for on S3 is missing.

Related

Test S3 operations with simulated credentials

I'm writing unit tests for a function that should authenticate to AWS S3 and then perform some operations on S3. I have a bunch of functions that do various things (like downloading/uploading files, checking for existence, etc).
Since in production these will need authentication, I want to simulate it in the tests, so that I can check that this part of the code is ok too.
For mocking the AWS environment I'm using moto. As of now I have the following code, below. A quick intro to it: my_head_bucket is an example function that needs to have unit tests. create_user_with_access_key_and_policy should mock the IAM user and policy needed for authenticated S3 access. It is almost the same as in the examples in the documentation.
Then there are two tests. The first should pass without errors (having the correct authentication). The second should fail with ClientError, because "invalid" keys are being passed.
For some reason I am not able to pass through the creation of the mock user and policy, getting botocore.exceptions.ClientError: An error occurred (InvalidClientTokenId) when calling the AttachUserPolicy operation: The security token included in the request is invalid. It seems that no authentication should be needed for create_access_key, so what am I doing wrong?
import os
import unittest
import boto3
import botocore
from moto import mock_s3, mock_iam
from botocore.client import ClientError
from moto.core import set_initial_no_auth_action_count
import json
def my_head_bucket(bucket, aws_access_key_id, aws_secret_access_key):
"""
This is a sample function. In the real case, this function will do more than
just heading a bucket (like uploading/downloading files or other actions).
It would be imported from another module and should not have any decorators (like moto's)
"""
s3_client = boto3.client("s3", aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key)
s3_client.head_bucket(Bucket=bucket)
#mock_iam
def create_user_with_access_key_and_policy(user_name="test-user"):
"""
Should create a user with attached policy allowing read/write operations on S3.
"""
policy_document = {
"Version": "2012-10-17",
"Statement": [
{"Effect": "Allow", "Action": "s3:*", "Resource": "*"}
],
}
# Create client and user
client = boto3.client("iam", region_name="us-east-1")
client.create_user(UserName=user_name)
# Create and attach the policy
policy_arn = client.create_policy(
PolicyName="policy1", PolicyDocument=json.dumps(policy_document)
)["Policy"]["Arn"]
client.attach_user_policy(UserName=user_name, PolicyArn=policy_arn)
# Return the access keys
return client.create_access_key(UserName=user_name)["AccessKey"]
class TestMyTest(unittest.TestCase):
#set_initial_no_auth_action_count(0)
#mock_s3
def test_correct_credentials(self):
"""
Sets the environment (creates user with keys and policy, creates the bucket), then calls
the function-to-be-tested and expects it to run without exceptions.
"""
### Arrange
iam_keys = create_user_with_access_key_and_policy()
print(iam_keys)
s3 = boto3.client('s3', aws_access_key_id=iam_keys["AccessKeyId"],aws_secret_access_key=iam_keys["SecretAccessKey"])
s3.create_bucket(Bucket='mock_bucket')
my_head_bucket('mock_bucket', aws_access_key_id=iam_keys["AccessKeyId"],aws_secret_access_key=iam_keys["SecretAccessKey"])
#set_initial_no_auth_action_count(0)
#mock_s3
def test_incorrect_credentials(self):
"""
Sets the environment (creates user with keys and policy, creates the bucket), then calls
the function-to-be-tested and expects it to run without exceptions.
"""
### Arrange
iam_keys = create_user_with_access_key_and_policy()
print(iam_keys)
s3 = boto3.client('s3', aws_access_key_id=iam_keys["AccessKeyId"],aws_secret_access_key=iam_keys["SecretAccessKey"])
s3.create_bucket(Bucket='mock_bucket')
with self.assertRaises(ClientError):
my_head_bucket('mock_bucket', aws_access_key_id=iam_keys["AccessKeyId"],aws_secret_access_key="invalid")

fetch the latest file in a folder and upload to s3?

filename variable is used to get the name of latest file
My aim is to monitor a folder and whenever new file is retrieved, automatically upload it to s3 bucket using boto3.
import time
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
from subprocess
import call
import os
import boto3
session = boto3.Session(aws_access_key_id='aws_access_key_id', aws_secret_access_key='aws_secret_access_key',
region_name='region_name')
s3 = session.client('s3')
class Watcher:
def __init__(self):
self.dir = os.path.abspath('D:\\project')
self.observer = Observer()
def run(self):
event_handler = Handler()
self.observer.schedule(event_handler, self.dir, recursive=True)
self.observer.start()
try:
while True:
time.sleep(5)
except:
self.observer.stop()
print("Error")
self.observer.join()
class Handler(FileSystemEventHandler):
#staticmethod
def on_any_event(event):
if event.is_directory:
return None
elif event.event_type == 'created':
print("Received created event - %s." % event.src_path)
s3.upload_file(Filename=event.src_path, bucket='bucketname, key=test-file-1)
if __name__ == '__main__':
w = Watcher()
w.run()
FileNotFoundError: [WinError 2] The system cannot find the file specified
As #alexhall mentioned in the comment, s3.meta.client.upload_file method will upload a file. You can read about boto3 s3 client's upload method documentation here: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.upload_file. However, it is a bit silly example they have there, as they are first creating an s3 resource rather than s3 client, and then because s3 resource does not actually have a method to upload file, they revert back to s3 client. You might as well directly create and use s3 client for uploads.
You are also relying on the fact that boto3 uses default session when you create the s3 resource like you did:
boto3.resource('s3')
This would work fine if you are running the code on lambda or if you are in an ec2 instance that has an IAM role configured for it to access s3 but I think you are running this outside AWS, in which case, you can have a boto3.Session() created first using your credentials, and then a client (or resource) can use that session.
aws_access_key_id = '<AWS_ACCESS_KEY_ID>'
aws_secret_access_key = '<AWS_SECRET_ACCESS_KEY>'
region_name = 'us-east-1'
session = boto3.Session(aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
region_name=region_name)
s3 = session.client('s3')
You can read about Session configuration here: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
As mentioned above, because you are trying to upload file, and you do not seem to do anything else with it, you may as well directly create an s3 client rather than s3 resource like you did and then get the s3 client using 'meta.client'.
instead of command = ... line, you simply use:
s3.upload_file(Filename, Bucket = 'aaaaa', Key='test-file-1')
You can delete the last line. You would 'call' if you you were running an OS/System command rather than a something within python.
Not sure if you are doing this to learn python (boto3). If so, congrats.
If not, AWS already provided such feature. So you keep everything in your code, but shell out to the AWS CLI.

How do I write a cloud function that monitors a storage bucket?

I have set up a Google Cloud Storage bucket to send notifications to a Pub/Sub topic:
gsutil notification create -t my-topic -f json gs://test-bucket
I have created a subscription to this topic to push messages to a cloud function endpoint:
gcloud pubsub subscriptions create my-sub --topic my-topic
And the cloud function is deployed with:
gcloud functions deploy promo_received --region europe-west1 --runtime python37 --trigger-topic my-topic
The purpose of the function (right now), is to check if a file being created in the test-bucket matches a specific file name, and to fire a message off to Slack when it does. Currently the function looks like this:
def promo_received(data):
date_str = datetime.today().strftime('%Y%m%d')
filename = json.loads(data)["name"]
bucket = json.loads(data)["bucket"]
if filename == 'PROM_DTLS_{}.txt.gz'.format(date_str):
msg = ":heavy_check_mark: *{}* has been uploaded to *{}*. Awaiting instructions.".format(filename, bucket)
post_to_slack(url, msg)
When I test this by dropping a file named PROM_DTLS_20190913.txt.gz, I can see the function fires, however it crashes with 2 errors:
TypeError: promo_received() takes 1 positional argument but 2 were given
TypeError: the JSON object must be str, bytes or bytearray, not LocalProxy
This is my first time attempting to do this, and I'm not sure where to start with troubleshooting. Any help would be greatly appreciated!
You need to add the context as argument for your function, that will solve the first error:
def promo_received(data, context):
[...]
Also, you don't need json.loads to retrieve the name of the file or the bucket:
data['name']
data['bucket']
This should get rid of the second error.
Check the example in the Google Cloud Storage Triggers documentation page
To write a Python Cloud Function, look at this example. Note that Cloud Storage serializes the object into a utf-8 JSON string, which Cloud Functions then base64-encodes. So you need to first base64-decode the payload, then utf8-decode it, then JSON parse it.
def promo_received(event, context):
obj = json.loads(base64.b64decode(event['data']).decode('utf-8'))
filename = obj[“name”]
bucket = obj[“bucket”]
# the rest of your code goes here

Flask stream/multipart file from S3

I'm using Flask in AWS Api Gateway/Lambda environment (Thanks to Zappa), but there is a limit in response size, so Flask's send_file is not enough in this context.
Is there a way I can stream/multipart(not sure if these are the correct terms) a file-like object as response in Flask? I can't send request bodies with more than 5mb(6mb?) in the AWS Serverless environment.
Current code (simple S3 proxy that deletes the object once downloaded):
#app.route('/polling/<key>')
def polling(key):
obj = BytesIO()
try:
s3.download_fileobj('carusoapi', key, obj)
s3.delete_object(Bucket='carusoapi', Key=key)
return send_file(obj, as_attachment=True, attachment_filename=key)
except Exception:
return 'File not ready yet', 204
I've seen some examples here but don't understand how to apply them or if that's even what I'm looking for.
I also noticed that boto3 S3 module has options like callback for download_fileobj here and you can specify chunksize here, but again, I don't understand how to apply this to a Flask response.
I know of a way to solve this that involves sending a signed download link to the client to download the item, but then I would have to implement in the client to delete the file.

Silent Failure of S3 PutObject?

I have been trying to programmatically upload SNS messages to an S3 bucket using the S3.Object.put() method like so:
bucket_resource=boto3.resource('s3')
bucket_client = boto3.client('s3')
body = subject + message
object = bucket_resource.Object(bucket_name, folder+'/'+fn)
object.put(Body=body)
This has not worked, so I have tried the following to try and upload an object to a particular S3 bucket.
body = subject + message
folder = datetime.datetime.today().strftime('%Y-%m-%d')
fn = datetime.datetime.today().strftime('%H:%M:%S')
key = folder_name + '/' + fn
bucket_resource = s3.Bucket(bucket_name)
bucket.upload_file(body, key)
However, both of these methods are failing silently. I am not getting any access denials, error messages, etc. but I am also not uploading my message to a bucket. I'm not sure what's happening with each invocation of the function, and would appreciate any guidance for people who have successfully uploaded files to buckets programmatically.
Note:
I have bucket policies in place where my account is the only account that can put objects in the bucket. Do I need an addendum to give Lambda permission to put objects in the bucket?
There was no error for me either but I'm using the .NET Core SDK. Turns out my problem was the function finishing before it was able to put the file in S3. In the .NET SDK the PutObject call is Asynchronous (PutObjectAsync) and even though I was waiting for it to finish, turns out I wasn't doing that correctly due to the layers of functions I had.
For anyone struggling with no errors and no files in S3, check that your function isn't finishing before the file is actually uploaded to S3.
I have not used python, but I faced a similar issue where my react app crashes when I do s3.upload(). The following answer helped me solve it.
https://stackoverflow.com/a/57139331/5295723
I had to convert the base64 image buffer to binary and it worked.
See here on handling errors in python.
Basically try/catch exception around sdk call, something along the lines of:
import boto3
from botocore.exceptions import ClientError, ParamValidationError
try:
bucket_resource=boto3.resource('s3')
bucket_client = boto3.client('s3')
body = subject + message
object = bucket_resource.Object(bucket_name, folder+'/'+fn)
object.put(Body=body)
except s3.exceptions.<fill in>:
print("known error occured")
except ClientError as e:
print("Unexpected error: %s" % e)
This is the link the original poster provided, and it shows possible exceptions you can catch for an S3 client

Categories