S3 presigned URL works 90 minutes after bucket creation - python

We generate presigned URLs in order for users to upload files directly into S3 buckets. Running integration tests we discovered a failing test where an HTTP PUT request on a presigned URL yielded a SignatureDoesNotMatch error response. Surprisingly, the same code worked fine using another bucket. We kept trying on the original bucket that caused the test to fail and were surprised when it suddenly started to work without any real code changes.
We noticed that it was roughly two hours after we had created the bucket when the test successfully ran through. Since we are located at UTC+0200 we suspected the issue to be somehow related to that time difference and/or some clock synching issue. We set out to confirm our suspicions that the same presigned URL would suddenly just work after enough time has passed. SPOILER: It does!
The following code creates a brand new bucket, generates a presigned URL suitable for file upload (ClientMethod='put_object'), and tries to HTTP PUT some data using the requests library. We re-try PUTting data every 60 seconds until it finally succeeds 5419 seconds (or 90 minutes) after the bucket was created.
Note: Even though the bucket is deleted afterwards, running the same script (using the same bucket name) now instantly succeeds. In case you want to re-confirm this behavior, make sure to use a different bucket name the second time around.
import logging
import time
import boto3
import requests
from botocore.client import Config
logger = logging.getLogger(__name__)
# region = "eu-central-1"
# region = "eu-west-1"
# region = "us-west-1"
region = "us-east-1"
s3_client = boto3.client('s3', region_name=region, config=Config(signature_version='s3v4'))
if __name__ == "__main__":
bucket_name = "some-globally-unique-bucket-name"
key_for_file = "test-file.txt"
# create bucket
if region == "us-east-1":
# https://github.com/boto/boto3/issues/125
s3_client.create_bucket(Bucket=bucket_name, ACL='private')
else:
s3_client.create_bucket(Bucket=bucket_name, ACL='private',
CreateBucketConfiguration={'LocationConstraint': region})
creation_time = time.time()
# generate presigned URL
file_data = b"Hello Test World"
expires_in = 4 * 3600
url = s3_client.generate_presigned_url(ClientMethod='put_object', ExpiresIn=expires_in,
Params={'Bucket': bucket_name, 'Key': key_for_file})
time_since_bucket_creation = time.time() - creation_time
time_interval = 60
max_time_passed = expires_in
success = False
try:
while time_since_bucket_creation < max_time_passed:
response = requests.put(url, data=file_data)
if response.status_code == 200:
success = True
break
if b"<Code>SignatureDoesNotMatch</Code>" in response.content:
reason = "SignatureDoesNotMatch"
else:
reason = str(response.content)
time_since_bucket_creation = time.time() - creation_time
print("="*50)
print(f"{time_since_bucket_creation:.2f} s after bucket creation")
print(f"unable to PUT data to url: {url}")
print(f"reason: {reason}")
print(response.content)
time.sleep(time_interval)
except KeyboardInterrupt:
print("Gracefully shutting down...")
if success:
print("YAY! File Upload was successful!")
time_since_bucket_creation = time.time() - creation_time
print(f"{time_since_bucket_creation:.2f} seconds after bucket creation")
s3_client.delete_object(Bucket=bucket_name, Key=key_for_file)
# delete bucket
s3_client.delete_bucket(Bucket=bucket_name)
We run integration tests with an AWS EKS cluster where we create a cluster along with some databases, S3 buckets, etc. and tear everything down after the tests have completed. Having to wait 90 minutes for the presigning of URLs to work is not feasible.
My Questions
Am I doing anything wrong?
Is this expected behavior?
Is there an acceptable workaround?
Can someone, please, confirm this behavior using the above code?
EDIT
I updated the code to create a bucket in the "us-east-1" region as suggested by "Michael - sqlbot" in the comments. The weird if statement is necessary as documented here. I am able to confirm Michael's suspicion that the behavior is NOT reproducible with "us-east-1".
In case it is of interest, the returned XML in the error case:
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>SignatureDoesNotMatch</Code>
<Message>The request signature we calculated does not match the signature you provided. Check your key and signing method.</Message>
<AWSAccessKeyId>REDACTED</AWSAccessKeyId>
<StringToSign>AWS4-HMAC-SHA256
20190609T170351Z
20190609/eu-central-1/s3/aws4_request
c143cb44fa45c56e52b04e61b777ae2206e0aaeed40dafc78e036878fa91dfd6</StringToSign>
<SignatureProvided>REDACTED</SignatureProvided>
<StringToSignBytes>REDACTED</StringToSignBytes>
<CanonicalRequest>PUT
/test-file.txt
X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=REDACTED%2F20190609%2Feu-central-1%2Fs3%2Faws4_request&X-Amz-Date=20190609T170351Z&X-Amz-Expires=14400&X-Amz-SignedHeaders=host
host:some-globally-unique-bucket-name.s3.eu-central-1.amazonaws.com
host
UNSIGNED-PAYLOAD</CanonicalRequest>
<CanonicalRequestBytes>REDACTED</CanonicalRequestBytes>
<RequestId>E6CBBC7D2E4D322E</RequestId>
<HostId>j1dM1MNaXaDhzMUXKhqdHd6+/Rl1C3GzdL9YDq0CuP8brQZQV6vbyE9Z63HBHiBWSo+hb6zHKVs=</HostId>
</Error>

Here's what you're bumping into:
A temporary redirect is a type of error response that signals to the requester that they should resend the request to a different endpoint. Due to the distributed nature of Amazon S3, requests can be temporarily routed to the wrong facility. This is most likely to occur immediately after buckets are created or deleted.
For example, if you create a new bucket and immediately make a request to the bucket, you might receive a temporary redirect, depending on the location constraint of the bucket. If you created the bucket in the US East (N. Virginia) AWS Region, you will not see the redirect because this is also the default Amazon S3 endpoint.
However, if the bucket is created in any other Region, any requests for the bucket go to the default endpoint while the bucket's DNS entry is propagated. The default endpoint redirects the request to the correct endpoint with an HTTP 302 response. Temporary redirects contain a URI to the correct facility, which you can use to immediately resend the request.
https://docs.aws.amazon.com/AmazonS3/latest/dev/Redirects.html
Note that the last part -- which you can use to immediately resend the request -- is not quite accurate. You can -- but if the request uses Signature Version 4, then following the redirect to the new hostname will result in a SignatureDoesNotMatch error because the hostname will be different. Back in the old days of Signature Version 2, the bucket name was included in the signature but the endpoint hostname itself was not, so the redirect to a different endpoint hostname would not invalidate the signature.
None of this would be a problem if boto were doing the right thing and using the correct regional endpoint to create the signed URL -- but for some reason, it uses the "global" (generic) endpoint -- which causes S3 to issue those redirects for the first few minutes of the bucket's lifetime, because DNS hasn't been updated, so the request misroutes to us-east-1 and gets redirected. That's why I suspected us-east-1 wouldn't exhibit the behavior.
This should be the default behavior, but it isn't; still, it seems like there should be a cleaner way to do this, automatically via configuration... and there may be... but I haven't found it in the documentation.
As a workaround, the client constructor accepts an endpoint_url argument, which seems to serve the purpose. As it turns out, s3.${region}.amazonaws.com is a valid endpoint for each S3 region, so they can be constructed from a region string.
s3_client = boto3.client('s3', region_name=region, endpoint_url=('https://s3.' + region + '.amazonaws.com'), config=...)
Long-time users of S3 may be suspicious of the claim that all regions support this, but it is accurate as of this writing. Originally, some regions formerly used a dash rather than a dot, e.g. s3-us-west-2.amazonaws.com and this is still valid in those older regions, but all regions now support the canonical form mentioned above.

Related

boto3: How to interract with DigitalOcean S3 Spaces when CDN is enabled

I'm working with DigitalOcean Spaces (S3 storage protocol) which has enabled CDN.
Any file on s3 can be accessed via direct URL in the given form:
https://my-bucket.fra1.digitaloceanspaces.com/<file_key>
If CDN is enabled, the file can be accessed via additional CDN URL:
https://my-bucket.fra1.cdn.digitaloceanspaces.com/<file_key>
where fra1 is a region_name.
When I'm using boto3 SDK for Python, the file URL is the following (generated by boto3):
https://fra1.digitaloceanspaces.com/my-bucket/<file_key>
# just note that bucket name is no more a domain part!
This format also works fine.
But, if CDN is enabled - file url causes an error:
EndpointConnectionError: Could not connect to the endpoint URL: https://fra1.cdn.digitaloceanspaces.com/my-bucket/<file_key>
assuming the endpoint_url was changed from
default_endpoint=https://fra1.digitaloceanspaces.com
to
default_endpoint=https://fra1.cdn.digitaloceanspaces.com
How to connect to CDN with proper URL without getting an error?
And why boto3 uses different URL format? Is any workaround can be applied in this case?
code:
s3_client = boto3.client('s3',
region_name=s3_configs['default_region'],
endpoint_url=s3_configs['default_endpoint'],
aws_access_key_id=s3_configs['bucket_access_key'],
aws_secret_access_key=s3_configs['bucket_secret_key'])
s3_client.download_file(bucket_name,key,local_filepath)
boto3 guide for DigitalOcean Spaces.
Here is what I've also tried but It didn't work:
Generate presigned url's
UPDATE
Based on #Amit Singh's answer:
As I mentioned before, I've already tried this trick with presigned URLs.
I've got Urls like this
https://fra1.digitaloceanspaces.com/<my-bucket>/interiors/uploaded/images/07IRgHJ2PFhVqVrJDCIpzhghqe4TwK1cSSUXaC4T.jpeg?<presigned-url-params>
The bucket name appears after endpoint. I had to move It to domain-level manually:
https://<my-bucket>.fra1.cdn.digitaloceanspaces.com/interiors/uploaded/images/07IRgHJ2PFhVqVrJDCIpzhghqe4TwK1cSSUXaC4T.jpeg?<presigned-url-params>
With this URL I can now connect to Digital ocean, but another arror occures:
This XML file does not appear to have any style information associated with it. The document tree is shown below.
<Error>
<Code>SignatureDoesNotMatch</Code>
<RequestId>tx00000000000008dfdbc88-006005347c-604235a-fra1a</RequestId>
<HostId>604235a-fra1a-fra1</HostId>
</Error>
As a workaround I've tired to use signature s3v4:
s3_client = boto3.client('s3',
region_name=configs['default_region'],
endpoint_url=configs['default_endpoint'],
aws_access_key_id=configs['bucket_access_key'],
aws_secret_access_key=configs['bucket_secret_key'],
config= boto3.session.Config(signature_version='s3v4'))
but It still fails.
boto3 is a client library for Amazon S3 and not Digital Ocean Spaces. So, boto3 will not recognize the CDN URL fra1.cdn.digitaloceanspaces.com since it is provided by Digital Ocean and the URL with CDN is not one of the supported URI patterns. I don't fully understand how CDNs work internally, so my guess is there might be challenges with implementing this redirection to correct URL.
Now that that's clear, let's see how we can get a pre-signed CDN URL. Suppose, your CDN URL is https://fra1.cdn.digitaloceanspaces.com and your space name is my-space. We want to get a pre-signed URL for an object my-example-object stored in the space.
import os
import boto3
from botocore.client import Config
# Initialize the client
session = boto3.session.Session()
client = session.client('s3',
region_name='fra1',
endpoint_url='https://fra1.digitaloceanspaces.com', # Remove `.cdn` from the URL
aws_access_key_id=os.getenv('SPACES_KEY'),
aws_secret_access_key=os.getenv('SPACES_SECRET'),
config=Config(s3={'addressing_style': 'virtual'}))
# Get a presigned URL for object
url = client.generate_presigned_url(ClientMethod='get_object',
Params={'Bucket': 'my-space',
'Key': 'my-example-object'},
ExpiresIn=300)
print(url)
The pre-signed URL will look something like :
https://my-space.fra1.digitaloceanspaces.com/my-example-object?AWSAccessKeyId=EXAMPLE7UQOTHDTF3GK4&Content-Type=text&Expires=1580419378&Signature=YIXPlynk4BALXE6fH7vqbnwjSEw%3D
Add the cdn in between either manually or programmatically, in case you need to so that your final URL will become:
https://my-space.fra1.cdn.digitaloceanspaces.com/my-example-object?AWSAccessKeyId=EXAMPLE7UQOTHDTF3GK4&Content-Type=text&Expires=1580419378&Signature=YIXPlynk4BALXE6fH7vqbnwjSEw%3D
This is your CDN URL.
Based on #Amit Singh's answer, I've made an additional research of this issue.
Answers that helped me were found here and here.
To make boto3 presigned URLs work, I've made the following update to client and generate_presigned_url() params.
s3_client = boto3.client('s3',
region_name=configs['default_region'],
endpoint_url=configs['default_endpoint'],
aws_access_key_id=configs['bucket_access_key'],
aws_secret_access_key=configs['bucket_secret_key'],
config=boto3.session.Config(signature_version='s3v4', retries={
'max_attempts': 10,
'mode': 'standard'
},
s3={'addressing_style': "virtual"}, ))
...
response = s3_client.generate_presigned_url('get_object',
Params={'Bucket': bucket_name,
'Key': object_name},
ExpiresIn=3600,
HttpMethod=None
)
After that, .cdn domain part shoud be added after region name.

Python boto3 checking for valid bucket within region

I have seen examples for checking whether an S3 bucket exists and have implemented them below. My bucket is located in us-east-1 region but the following code doesn't throw an exception. Is there a way to make the check region specific depending on my session?
session = boto3.Session(
profile_name = 'TEST'
,region_name='ap-south-1'
)
s3 = session.resource('s3')
bucket_name = 'TEST_BUCKET'
try:
s3.meta.client.head_bucket(Bucket = bucket_name)
except ClientError as c:
print(c)
It does not matter which S3 regional endpoint you send the request to. The underlying SDK (boto3) will redirect as needed. It's preferable, however, to target the correct region if you know it in advance, to save on redirects.
You can see this in detail if you use the awscli in debug mode:
aws s3api head-bucket --bucket mybucket --region ap-south-1 --debug
You will see debug output similar to this:
DEBUG - S3 client configured for region ap-south-1 but the bucket mybucket is in region us-east-1; Please configure the proper region to avoid multiple unnecessary redirects and signing attempts.
DEBUG - Switching signature version for service s3 to version s3v4 based on config file override.
DEBUG - Updating URI from https://s3.ap-south-1.amazonaws.com/mybucket to https://s3.us-east-1.amazonaws.com/mybucket
Note that the awscli uses the boto3 SDK, as does your Python script.

AWS boto3: see proof that client is using transfer acceleration endpoint?

I'm trying to enable Transfer Acceleration for some AWS S3 buckets.
I start up my client session:
client = boto3.client(
"s3",
aws_access_key_id=environ.get("AWS_ACCESS_KEY_ID"),
aws_secret_access_key=environ.get("AWS_SECRET_ACCESS_KEY")
)
Then I turn Transfer Acceleration on through the S3 console, and have ensured it is enabled and turned on in the code as such:
response = client.put_bucket_accelerate_configuration(
Bucket='string',
AccelerateConfiguration={
'Status': 'Enabled'
}
)
and
response = client.get_bucket_accelerate_configuration(
Bucket='string'
)
both snippets come straight from boto3 docs. I am able to upload to the bucket successfully later on in the code with:
client.upload_fileobj(data, environ.get("AWS_S3_BUCKET"), 'key')
I tried setting the endpoint_url param while starting the client session, but this just created a new folder (with my bucket title) inside my bucket.
It seems that boto3 is the only SDK that doesn't have some sort of "use transfer acceleration endpoint" flag. I know it is enabled on the bucket, and I have proof of that, but I have no proof that it is actually using the endpoint.
I've tried going through client metadata, bucket metadata, and every other client method that returns any sort of data, and I can't find proof that it actually used the acceleration endpoint.
Am I missing something?
Connect to S3 accelerate endpoint with boto3 mentions using:
Config(s3={"use_accelerate_endpoint": True})
This parameter is listed in Config Reference — botocore documentation:
s3 (dict)
use_accelerate_endpoint -- Refers to whether to use the S3 Accelerate endpoint. The value must be a boolean. If True, the client will use the S3 Accelerate endpoint. If the S3 Accelerate endpoint is being used then the addressing style will always be virtual.
So try using:
s3_client = boto3.client("s3", config=Config(s3={"use_accelerate_endpoint": True}))

Flask stream/multipart file from S3

I'm using Flask in AWS Api Gateway/Lambda environment (Thanks to Zappa), but there is a limit in response size, so Flask's send_file is not enough in this context.
Is there a way I can stream/multipart(not sure if these are the correct terms) a file-like object as response in Flask? I can't send request bodies with more than 5mb(6mb?) in the AWS Serverless environment.
Current code (simple S3 proxy that deletes the object once downloaded):
#app.route('/polling/<key>')
def polling(key):
obj = BytesIO()
try:
s3.download_fileobj('carusoapi', key, obj)
s3.delete_object(Bucket='carusoapi', Key=key)
return send_file(obj, as_attachment=True, attachment_filename=key)
except Exception:
return 'File not ready yet', 204
I've seen some examples here but don't understand how to apply them or if that's even what I'm looking for.
I also noticed that boto3 S3 module has options like callback for download_fileobj here and you can specify chunksize here, but again, I don't understand how to apply this to a Flask response.
I know of a way to solve this that involves sending a signed download link to the client to download the item, but then I would have to implement in the client to delete the file.

Google Admin Directory API - Send a query via apiclient

I am retrieving a ChromeOS device MAC address via the Google Admin Directory API using the device's Serial Number as reference, and am making my calls through
apiclient.
service = discovery.build('admin', 'directory_v1', developerKey=settings.API_KEY)
Here are the calls available for ChromeOS devices; my issue is that I require a Device ID in order to execute the following:
service.chromeosdevices().get(customerId=settings.CID, deviceId=obtained_id, projection=None).execute()
I can send a GET query via the following format:
https://www.googleapis.com/admin/directory/v1/customer/my_customer/devices/chromeos?projection=full&query=id:" + serial + "&orderBy=status&sortOrder=ascending&maxResults=10", "GET")
... but I'm trying to avoid using OAuth2 and just use my API key. Passing the key in a GET request doesn't work either, as it still returns a "Login Required" notice.
How do I squeeze the above query into an apiclient-friendly format? The only option I found via the above calls was to request every device we have (via list), then sift through the mountain of data for the matching Serial number, which seems silly and excessive.
I did notice I could call apiclient.http.HttpRequests, but I couldn't find a way to pass the API key through it either. There's new_batch_http_request, but I can't discern from the docs how to simply pass a URL to it.
Thank you!
Got it!
You can't use just a key for Directory API queries, you need a Service account.
I'm using google-auth (see here) since oauth2client is deprecated.
You also need to:
Delegate the necessary permissions for your service account (mine has the role of Viewer and has scope access to https://www.googleapis.com/auth/admin.directory.device.chromeos.readonly)
Delegate API access to it separately in the Admin Console (Security -> Advanced Settings -> Authentication)
Get your json client secret key and place it with your app (don't include it in your VCS)
Obtain your credentials like this:
credentials = service_account.Credentials.from_service_account_file(
settings.CLIENT_KEY,
scopes=settings.SCOPES,
subject=settings.ADMIN_USER)
where ADMIN_USER is the email address of an authorized Domain admin.
Then you send a GET request like so:
authed_session = AuthorizedSession(credentials)
response = authed_session.get(request_id_url)
This returns a Requests object you can read via response.content.
Hope it helps someone else!

Categories