boto3: How to interract with DigitalOcean S3 Spaces when CDN is enabled

boto3: How to interract with DigitalOcean S3 Spaces when CDN is enabled - python

I'm working with DigitalOcean Spaces (S3 storage protocol) which has enabled CDN.
Any file on s3 can be accessed via direct URL in the given form:
https://my-bucket.fra1.digitaloceanspaces.com/<file_key>
If CDN is enabled, the file can be accessed via additional CDN URL:
https://my-bucket.fra1.cdn.digitaloceanspaces.com/<file_key>
where fra1 is a region_name.
When I'm using boto3 SDK for Python, the file URL is the following (generated by boto3):
https://fra1.digitaloceanspaces.com/my-bucket/<file_key>
# just note that bucket name is no more a domain part!
This format also works fine.
But, if CDN is enabled - file url causes an error:
EndpointConnectionError: Could not connect to the endpoint URL: https://fra1.cdn.digitaloceanspaces.com/my-bucket/<file_key>
assuming the endpoint_url was changed from
default_endpoint=https://fra1.digitaloceanspaces.com
to
default_endpoint=https://fra1.cdn.digitaloceanspaces.com
How to connect to CDN with proper URL without getting an error?
And why boto3 uses different URL format? Is any workaround can be applied in this case?
code:
s3_client = boto3.client('s3',
region_name=s3_configs['default_region'],
endpoint_url=s3_configs['default_endpoint'],
aws_access_key_id=s3_configs['bucket_access_key'],
aws_secret_access_key=s3_configs['bucket_secret_key'])
s3_client.download_file(bucket_name,key,local_filepath)
boto3 guide for DigitalOcean Spaces.
Here is what I've also tried but It didn't work:
Generate presigned url's
UPDATE
Based on #Amit Singh's answer:
As I mentioned before, I've already tried this trick with presigned URLs.
I've got Urls like this
https://fra1.digitaloceanspaces.com/<my-bucket>/interiors/uploaded/images/07IRgHJ2PFhVqVrJDCIpzhghqe4TwK1cSSUXaC4T.jpeg?<presigned-url-params>
The bucket name appears after endpoint. I had to move It to domain-level manually:
https://<my-bucket>.fra1.cdn.digitaloceanspaces.com/interiors/uploaded/images/07IRgHJ2PFhVqVrJDCIpzhghqe4TwK1cSSUXaC4T.jpeg?<presigned-url-params>
With this URL I can now connect to Digital ocean, but another arror occures:
This XML file does not appear to have any style information associated with it. The document tree is shown below.
<Error>
<Code>SignatureDoesNotMatch</Code>
<RequestId>tx00000000000008dfdbc88-006005347c-604235a-fra1a</RequestId>
<HostId>604235a-fra1a-fra1</HostId>
</Error>
As a workaround I've tired to use signature s3v4:
s3_client = boto3.client('s3',
region_name=configs['default_region'],
endpoint_url=configs['default_endpoint'],
aws_access_key_id=configs['bucket_access_key'],
aws_secret_access_key=configs['bucket_secret_key'],
config= boto3.session.Config(signature_version='s3v4'))
but It still fails.

boto3 is a client library for Amazon S3 and not Digital Ocean Spaces. So, boto3 will not recognize the CDN URL fra1.cdn.digitaloceanspaces.com since it is provided by Digital Ocean and the URL with CDN is not one of the supported URI patterns. I don't fully understand how CDNs work internally, so my guess is there might be challenges with implementing this redirection to correct URL.
Now that that's clear, let's see how we can get a pre-signed CDN URL. Suppose, your CDN URL is https://fra1.cdn.digitaloceanspaces.com and your space name is my-space. We want to get a pre-signed URL for an object my-example-object stored in the space.
import os
import boto3
from botocore.client import Config
# Initialize the client
session = boto3.session.Session()
client = session.client('s3',
region_name='fra1',
endpoint_url='https://fra1.digitaloceanspaces.com', # Remove `.cdn` from the URL
aws_access_key_id=os.getenv('SPACES_KEY'),
aws_secret_access_key=os.getenv('SPACES_SECRET'),
config=Config(s3={'addressing_style': 'virtual'}))
# Get a presigned URL for object
url = client.generate_presigned_url(ClientMethod='get_object',
Params={'Bucket': 'my-space',
'Key': 'my-example-object'},
ExpiresIn=300)
print(url)
The pre-signed URL will look something like :
https://my-space.fra1.digitaloceanspaces.com/my-example-object?AWSAccessKeyId=EXAMPLE7UQOTHDTF3GK4&Content-Type=text&Expires=1580419378&Signature=YIXPlynk4BALXE6fH7vqbnwjSEw%3D
Add the cdn in between either manually or programmatically, in case you need to so that your final URL will become:
https://my-space.fra1.cdn.digitaloceanspaces.com/my-example-object?AWSAccessKeyId=EXAMPLE7UQOTHDTF3GK4&Content-Type=text&Expires=1580419378&Signature=YIXPlynk4BALXE6fH7vqbnwjSEw%3D
This is your CDN URL.

Based on #Amit Singh's answer, I've made an additional research of this issue.
Answers that helped me were found here and here.
To make boto3 presigned URLs work, I've made the following update to client and generate_presigned_url() params.
s3_client = boto3.client('s3',
region_name=configs['default_region'],
endpoint_url=configs['default_endpoint'],
aws_access_key_id=configs['bucket_access_key'],
aws_secret_access_key=configs['bucket_secret_key'],
config=boto3.session.Config(signature_version='s3v4', retries={
'max_attempts': 10,
'mode': 'standard'
},
s3={'addressing_style': "virtual"}, ))
...
response = s3_client.generate_presigned_url('get_object',
Params={'Bucket': bucket_name,
'Key': object_name},
ExpiresIn=3600,
HttpMethod=None
)
After that, .cdn domain part shoud be added after region name.

Related

Endpoint is weird Amazon Textract Python

I'm trying to use textract in python. I got the code from this url: https://github.com/aws-samples/amazon-textract-code-samples/blob/c8f34ca25113100730e0f4db3f6f316b0cff44d6/python/02-detect-text-s3.py.
I only changed s3BucketName and documentName in the code. But when I ran the code, I got this error:
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "https://textract.USA.amazonaws.com/"
Should I alter the url manually? If so, how can i do that?

The endpoint URL depends on your AWS region; USA is not a valid AWS region.
You can set the region when creating the boto3 client:
textract = boto3.client('textract', region_name='us-west-1')
will use https://textract.us-west-1.amazonaws.com/ as the endpoint.
Alternatively, the region can come from the profile or environment; see the boto3 configuration docs for more details.

S3 presigned URL works 90 minutes after bucket creation

We generate presigned URLs in order for users to upload files directly into S3 buckets. Running integration tests we discovered a failing test where an HTTP PUT request on a presigned URL yielded a SignatureDoesNotMatch error response. Surprisingly, the same code worked fine using another bucket. We kept trying on the original bucket that caused the test to fail and were surprised when it suddenly started to work without any real code changes.
We noticed that it was roughly two hours after we had created the bucket when the test successfully ran through. Since we are located at UTC+0200 we suspected the issue to be somehow related to that time difference and/or some clock synching issue. We set out to confirm our suspicions that the same presigned URL would suddenly just work after enough time has passed. SPOILER: It does!
The following code creates a brand new bucket, generates a presigned URL suitable for file upload (ClientMethod='put_object'), and tries to HTTP PUT some data using the requests library. We re-try PUTting data every 60 seconds until it finally succeeds 5419 seconds (or 90 minutes) after the bucket was created.
Note: Even though the bucket is deleted afterwards, running the same script (using the same bucket name) now instantly succeeds. In case you want to re-confirm this behavior, make sure to use a different bucket name the second time around.
import logging
import time
import boto3
import requests
from botocore.client import Config
logger = logging.getLogger(__name__)
# region = "eu-central-1"
# region = "eu-west-1"
# region = "us-west-1"
region = "us-east-1"
s3_client = boto3.client('s3', region_name=region, config=Config(signature_version='s3v4'))
if __name__ == "__main__":
bucket_name = "some-globally-unique-bucket-name"
key_for_file = "test-file.txt"
# create bucket
if region == "us-east-1":
# https://github.com/boto/boto3/issues/125
s3_client.create_bucket(Bucket=bucket_name, ACL='private')
else:
s3_client.create_bucket(Bucket=bucket_name, ACL='private',
CreateBucketConfiguration={'LocationConstraint': region})
creation_time = time.time()
# generate presigned URL
file_data = b"Hello Test World"
expires_in = 4 * 3600
url = s3_client.generate_presigned_url(ClientMethod='put_object', ExpiresIn=expires_in,
Params={'Bucket': bucket_name, 'Key': key_for_file})
time_since_bucket_creation = time.time() - creation_time
time_interval = 60
max_time_passed = expires_in
success = False
try:
while time_since_bucket_creation < max_time_passed:
response = requests.put(url, data=file_data)
if response.status_code == 200:
success = True
break
if b"<Code>SignatureDoesNotMatch</Code>" in response.content:
reason = "SignatureDoesNotMatch"
else:
reason = str(response.content)
time_since_bucket_creation = time.time() - creation_time
print("="*50)
print(f"{time_since_bucket_creation:.2f} s after bucket creation")
print(f"unable to PUT data to url: {url}")
print(f"reason: {reason}")
print(response.content)
time.sleep(time_interval)
except KeyboardInterrupt:
print("Gracefully shutting down...")
if success:
print("YAY! File Upload was successful!")
time_since_bucket_creation = time.time() - creation_time
print(f"{time_since_bucket_creation:.2f} seconds after bucket creation")
s3_client.delete_object(Bucket=bucket_name, Key=key_for_file)
# delete bucket
s3_client.delete_bucket(Bucket=bucket_name)
We run integration tests with an AWS EKS cluster where we create a cluster along with some databases, S3 buckets, etc. and tear everything down after the tests have completed. Having to wait 90 minutes for the presigning of URLs to work is not feasible.
My Questions
Am I doing anything wrong?
Is this expected behavior?
Is there an acceptable workaround?
Can someone, please, confirm this behavior using the above code?
EDIT
I updated the code to create a bucket in the "us-east-1" region as suggested by "Michael - sqlbot" in the comments. The weird if statement is necessary as documented here. I am able to confirm Michael's suspicion that the behavior is NOT reproducible with "us-east-1".
In case it is of interest, the returned XML in the error case:
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>SignatureDoesNotMatch</Code>
<Message>The request signature we calculated does not match the signature you provided. Check your key and signing method.</Message>
<AWSAccessKeyId>REDACTED</AWSAccessKeyId>
<StringToSign>AWS4-HMAC-SHA256
20190609T170351Z
20190609/eu-central-1/s3/aws4_request
c143cb44fa45c56e52b04e61b777ae2206e0aaeed40dafc78e036878fa91dfd6</StringToSign>
<SignatureProvided>REDACTED</SignatureProvided>
<StringToSignBytes>REDACTED</StringToSignBytes>
<CanonicalRequest>PUT
/test-file.txt
X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=REDACTED%2F20190609%2Feu-central-1%2Fs3%2Faws4_request&X-Amz-Date=20190609T170351Z&X-Amz-Expires=14400&X-Amz-SignedHeaders=host
host:some-globally-unique-bucket-name.s3.eu-central-1.amazonaws.com
host
UNSIGNED-PAYLOAD</CanonicalRequest>
<CanonicalRequestBytes>REDACTED</CanonicalRequestBytes>
<RequestId>E6CBBC7D2E4D322E</RequestId>
<HostId>j1dM1MNaXaDhzMUXKhqdHd6+/Rl1C3GzdL9YDq0CuP8brQZQV6vbyE9Z63HBHiBWSo+hb6zHKVs=</HostId>
</Error>

Here's what you're bumping into:
A temporary redirect is a type of error response that signals to the requester that they should resend the request to a different endpoint. Due to the distributed nature of Amazon S3, requests can be temporarily routed to the wrong facility. This is most likely to occur immediately after buckets are created or deleted.
For example, if you create a new bucket and immediately make a request to the bucket, you might receive a temporary redirect, depending on the location constraint of the bucket. If you created the bucket in the US East (N. Virginia) AWS Region, you will not see the redirect because this is also the default Amazon S3 endpoint.
However, if the bucket is created in any other Region, any requests for the bucket go to the default endpoint while the bucket's DNS entry is propagated. The default endpoint redirects the request to the correct endpoint with an HTTP 302 response. Temporary redirects contain a URI to the correct facility, which you can use to immediately resend the request.
https://docs.aws.amazon.com/AmazonS3/latest/dev/Redirects.html
Note that the last part -- which you can use to immediately resend the request -- is not quite accurate. You can -- but if the request uses Signature Version 4, then following the redirect to the new hostname will result in a SignatureDoesNotMatch error because the hostname will be different. Back in the old days of Signature Version 2, the bucket name was included in the signature but the endpoint hostname itself was not, so the redirect to a different endpoint hostname would not invalidate the signature.
None of this would be a problem if boto were doing the right thing and using the correct regional endpoint to create the signed URL -- but for some reason, it uses the "global" (generic) endpoint -- which causes S3 to issue those redirects for the first few minutes of the bucket's lifetime, because DNS hasn't been updated, so the request misroutes to us-east-1 and gets redirected. That's why I suspected us-east-1 wouldn't exhibit the behavior.
This should be the default behavior, but it isn't; still, it seems like there should be a cleaner way to do this, automatically via configuration... and there may be... but I haven't found it in the documentation.
As a workaround, the client constructor accepts an endpoint_url argument, which seems to serve the purpose. As it turns out, s3.${region}.amazonaws.com is a valid endpoint for each S3 region, so they can be constructed from a region string.
s3_client = boto3.client('s3', region_name=region, endpoint_url=('https://s3.' + region + '.amazonaws.com'), config=...)
Long-time users of S3 may be suspicious of the claim that all regions support this, but it is accurate as of this writing. Originally, some regions formerly used a dash rather than a dot, e.g. s3-us-west-2.amazonaws.com and this is still valid in those older regions, but all regions now support the canonical form mentioned above.

When saving FogBugz attachment, server always returns empty response (with some headers)

I'am trying to get case attachment to save in local folder. I have problem with using attachment url to download it, each time server returns empty results and status code 200.
This is a sample url I use (changed host and token) :
https://example.fogbugz.com/default.asp?pg=pgDownload&pgType=pgFile&ixBugEvent=385319&ixAttachment=56220&sFileName=Log.7z&sTicket=&sToken=1234567890627ama72kaors2grlgsk
I have tried using token instead of sToken but no difference. If I copy above url to chrome then it won't work either, but If I login to FogBugz (manuscript) and then try this url again then it works. So I suppose there are some security issues here.
btw. I use python FogBugz api for this and save url using urllib urllib.request.urlretrieve(url, "fb/" + file_name)

Solution I have found is to use cookies from the web browser where I have previously loged in to FB account I use. So it looks like a security issue.
For that I used pycookiecheat (for windows see my fork: https://github.com/luskan/pycookiecheat). For full code see here: https://gist.github.com/luskan/66ffb8f82afb96d29d3f56a730340adc

Why do I get an access denied when I try to do a redirect from one s3 object to another?

I have been following this post:
https://forums.aws.amazon.com/message.jspa?messageID=484342#484342
Just so whenever I generate a presigned url, I don't show my AWS Access Key Id.
url = s3_client.generate_presigned_url('get_object', Params={'Bucket': Bucket, 'Key': Key})
s3_client.put_object(Bucket="dummybucket, Key=other_key, WebsiteRedirectLocation=url)
My "dummybucket" has ACL='public-read'
So whenever I try to access http://dummybucket.s3.amazonaws.com/other_key
I get an access denied rather than the original object I'm trying to get.
I've also uploaded a file into the "other_bucket" and I can access that fine from the browser.
Things I haven't done:
Add policy to S3 bucket I'm trying to access
Enable website configuration for S3 bucket
EDIT: I cleared my browser cache too

I realized I had the wrong S3 bucket url that does the redirect.
*bucket_name*.s3-website.*region*.amazonaws.com

how to download a file from s3 bucket with a temporary token in python

I have a django web app and I want to allow it to download files from my s3 bucket.
The files are not public. I have an IAM policy to access them.
The problem is that I do NOT want to download the file on the django app server and then serve it to download on the client. That is like downloading twice. I want to be able to download directly on the client of the django app.
Also, I don't think it's safe to pass my IAM credentials in an http request so I think I need to use a temporary token.
I read:
http://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_use-resources.html
but I just do not understand how to generate a temporary token on the fly.
A python solution (maybe using boto) would be appreciated.

With Boto (2), it should be really easy to generate time-limited download URLs, should your IAM policy have the proper permissions. I am using this approach to serve videos to logged-in users from private S3 bucket.
from boto.s3.connection import S3Connection
conn = S3Connection('<aws access key>', '<aws secret key>')
bucket = conn.get_bucket('mybucket')
key = bucket.get_key('mykey', validate=False)
url = key.generate_url(86400)
This would generate a download URL for key foo in the given bucket, that is valid for 24 hours (86400 seconds). Without validate=False Boto 2 will check that the key actually exists in the bucket first, and if not, will throw an exception. With these server-controlled files it is often an unnecessary extra step, thus validate=False in the example
In Boto3 the API is quite different:
s3 = boto3.client('s3')
# Generate the URL to get 'key-name' from 'bucket-name'
url = s3.generate_presigned_url(
ClientMethod='get_object',
Params={
'Bucket': 'mybucket',
'Key': 'mykey'
},
expires=86400
)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.