I'm trying to create an s3 bucket in every region in AWS with boto3 in python but I'm failing to create a bucket in 4 regions (af-south-1, eu-south-1, ap-east-1 & me-south-1)
My python code:
def create_bucket(name, region):
s3 = boto3.client('s3')
s3.create_bucket(Bucket=name, CreateBucketConfiguration={'LocationConstraint': region})
and the exception I get:
botocore.exceptions.ClientError: An error occurred (InvalidLocationConstraint) when calling the CreateBucket operation: The specified location-constraint is not valid
I can create buckets in these regions from the aws website but it is not good for me, so I tried to do create it directly from the rest API without boto3.
url: bucket-name.s3.amazonaws.com
body:
<?xml version="1.0" encoding="UTF-8"?>
<CreateBucketConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<LocationConstraint>eu-south-1</LocationConstraint>
</CreateBucketConfiguration>
but the response was similar to the exception:
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>InvalidLocationConstraint</Code>
<Message>The specified location-constraint is not valid</Message>
<LocationConstraint>eu-south-1</LocationConstraint>
<RequestId>********</RequestId>
<HostId>**************</HostId>
</Error>
Does anyone have an idea why I can do it manually from the site but not from python?
The regions your code fails in are relativly new regions, where you need to opt-in first to use them, see here Managing AWS Regions
Newer AWS regions only support regional endpoints. Thus, if creating buckets in one of those regions, a regional endpoint needs to be created.
Since I was creating buckets in multiple regions, I set the endpoint by creating a new instance of the client for each region. (This was in Node.js, but should still work with boto3)
client = boto3.client('s3', region_name='region')
See the same problem on Node.js here
Related
I am making calls to API using python request library, and I am receiving the response in JSON. Currently I am saving JSON response on local computer, what I would like to do is to load JSON response directly to s3 bucket. The reason for loading to s3 bucket is my s3 bucket is acting as source to parse the JSON response for relational output. I was wondering how can I load JSON file directly to s3 bucket without using Access key or secret key ?
Most of my research on this topic lead to usingboto3 in python. Unfortunately, this library also requires key and ID. The reason for not using secret key and ID is because my organization has separate department which takes care of giving access to s3 bucket, and the department can only create IAM role with read and write access. I am curious what is the common industry practice of loading JSON in your organization ?
You can make unsigned requests for S3 through VPC Endpoint (VPCE), and don't need any AWS credentials this way.
# https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-example-privatelink.html
s3 = boto3.client('s3', config=Config(signature_version=UNSIGNED), endpoint_url="https://bucket.vpce-xxx-xxx.s3.ap-northeast-1.vpce.amazonaws.com")
You can restrict source ip by setting security group in VPC Endpoint to protect your S3 Bucket. Note that the owner of s3 objects uploaded by unsigned requests is anonymous, and may cause some side effects. In my case, Lifecycle rules cannot apply to those s3 objects.
I am playing with amazon-s3. My use case is to just list keys starting with a prefix.
import boto3
s3 = boto3.client('s3', "eu-west-1")
response = s3.list_objects_v2(Bucket="my-bucket", MaxKeys=1, Prefix="my/prefix/")
for content in response['Contents']:
print(content['Key'])
my-bucket is located in use1. I am surprised that my python client from euw1 is able to do such requests. For reference, my scala client:
val client = AmazonS3ClientBuilder.standard().build()
client.listObjectsV2("my-bucket", "my-prefix")
which gives an error
com.amazonaws.services.s3.model.AmazonS3Exception: The bucket is in this region: us-east-1. Please use this region to retry the request
which is expected.
My question is, why the s3Client is location dependant ? Is there any advantage to choose the right location ? Is there any hidden cost to not match the location ?
My question is, why the s3Client is location dependant
Because buckets are regional resources even if they pretend not to be. Although a S3 bucket is globally accessible, the underlying resources are still hosted in a specific underlying AWS region. If you're using the AWS client sdks to access the resources, you need to be connecting to the bucket's regional S3 endpoint.
Is there any advantage to choose the right location?
Lower latency. If your services are in eu-west-1 it makes sense to have your buckets there too. You also will not pay cross-region data transfer rates, but rather AWS's internal region rate.
Is there any hidden cost to not match the location?
Yes. Costs for data egress vary based on region, and you pay more to send data from one region to another than you will to send data between services in the same region.
As to why the boto3 library is not raising an error, it is possibly interrogating the S3 api under the hood to establish where the bucket is located before issuing the list_objects_v2 call.
I am trying to move files older than a hour from one s3 bucket to another s3 bucket using python boto3 AWS lambda function with following cases:
Both buckets can be in same account and different region.
Both buckets can be in different account and different region.
Both buckets can be in different account and same region.
I got some help to move files using the python code mentioned by #John Rotenstein
import boto3
from datetime import datetime, timedelta
SOURCE_BUCKET = 'bucket-a'
DESTINATION_BUCKET = 'bucket-b'
s3_client = boto3.client('s3')
# Create a reusable Paginator
paginator = s3_client.get_paginator('list_objects_v2')
# Create a PageIterator from the Paginator
page_iterator = paginator.paginate(Bucket=SOURCE_BUCKET)
# Loop through each object, looking for ones older than a given time period
for page in page_iterator:
for object in page['Contents']:
if object['LastModified'] < datetime.now().astimezone() - timedelta(hours=1): # <-- Change time period here
print(f"Moving {object['Key']}")
# Copy object
s3_client.copy_object(
Bucket=DESTINATION_BUCKET,
Key=object['Key'],
CopySource={'Bucket':SOURCE_BUCKET, 'Key':object['Key']}
)
# Delete original object
s3_client.delete_object(Bucket=SOURCE_BUCKET, Key=object['Key'])
How can this be modified to cater the requirement
An alternate approach would be to use Amazon S3 Replication, which can replicate bucket contents:
Within the same region, or between regions
Within the same AWS Account, or between different Accounts
Replication is frequently used when organizations need another copy of their data in a different region, or simply for backup purposes. For example, critical company information can be replicated to another AWS Account that is not accessible to normal users. This way, if some data was deleted, there is another copy of it elsewhere.
Replication requires versioning to be activated on both the source and destination buckets. If you require encryption, use standard Amazon S3 encryption options. The data will also be encrypted during transit.
You configure a source bucket and a destination bucket, then specify which objects to replicate by providing a prefix or a tag. Objects will only be replicated once Replication is activated. Existing objects will not be copied. Deletion is intentionally not replicated to avoid malicious actions. See: What Does Amazon S3 Replicate?
There is no "additional" cost for S3 replication, but you will still be charge for any Data Transfer charges when moving objects between regions, and for API Requests (that are tiny charges), plus storage of course.
Moving between regions
This is a non-issue. You can just copy the object between buckets and Amazon S3 will figure it out.
Moving between accounts
This is a bit harder because the code will use a single set of credentials must have ListBucket and GetObject access on the source bucket, plus PutObject rights to the destination bucket.
Also, if credentials are being used from the Source account, then the copy must be performed with ACL='bucket-owner-full-control' otherwise the Destination account won't have access rights to the object. This is not required when the copy is being performed with credentials from the Destination account.
Let's say that the Lambda code is running in Account-A and is copying an object to Account-B. An IAM Role (Role-A) is assigned to the Lambda function. It's pretty easy to give Role-A access to the buckets in Account-A. However, the Lambda function will need permissions to PutObject in the bucket (Bucket-B) in Account-B. Therefore, you'll need to add a bucket policy to Bucket-B that allows Role-A to PutObject into the bucket. This way, Role-A has permission to read from Bucket-A and write to Bucket-B.
So, putting it all together:
Create an IAM Role (Role-A) for the Lambda function
Give the role Read/Write access as necessary for buckets in the same account
For buckets in other accounts, add a Bucket Policy that grants the necessary access permissions to the IAM Role (Role-A)
In the copy_object() command, include ACL='bucket-owner-full-control' (this is the only coding change needed)
Don't worry about doing any for cross-region, it should just work automatically
Recently my code is facing intermittent issues with a bucket in S3 that is basically downloading, parsing and reuploading files, I'm using python and boto (not boto3) to do the S3 actions my version is boto 2.36.0 Do you have any idea why sometimes S3 gives that kind of errors.
Based on their documentation
https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadComplete.htmlç
Sample Response with Error Specified in Body
The following response indicates that an error occurred after the HTTP response header was sent. Note that while the HTTP status code is 200 OK, the request actually failed as described in the Error element.
But still is not really a good example of what's going on and why it happens sometimes
I've tried some manual uploads to my bucket using the same version but I haven't noticed anything while doing it manually
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>InternalError</Code>
<Message>We encountered an internal error. Please try again.</Message>
<RequestId>A127747D40AB1AC3</RequestId>
<HostId>Clz3f+rO2K1KfD0ZwSkpa9WnPvUh/mngdi99eDiSbdR0uzOP5a7RcYUem6ILYbtQdIJ02aUw2M4=</HostId>
</Error>
I'm experiencing a problem with permissions on a file in AWS S3 after updating it with a Python script using the Boto library.
Here is the command that I'm using to update the file in S3:
k.set_contents_from_string(json.dumps(parsed_json, indent=4))
The file gets updated correctly, however the permissions get changed which is very strange to me.
Before updating the file, the permissions are:
Granteed: awsprod Open/Download(checked) View Permissions(checked) Edit Permissions(checked)
After updating the file, the permissions are gone and nothing shows under permissions when looking it through the AWS Dashboard/Console.
After the permissions are removed, downloading the file is no longer possible and it fails every time.
Via aws cli:
A client error (403) occurred when calling the HeadObject operation: Forbidden
Completed 1 part(s) with ... file(s) remaining
Via aws console:
<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>A6E4AA2E3A3B9429</RequestId>
<HostId>
47gMhTpdFRAYm1cP4noivQlNEeB/cxHr2QFRXewNERdYcGcan2QU/fOVQ/upOl7Zp9fNIXLUnkk=
</HostId>
</Error>
My access is via IAM, and my user has the "AmazonS3FullAccess" policy giving me full access to S3.
What is even more intriguing is that I have 2 AWS accounts, and the same script works well in one of them without changing the file permissions, and on the other account I have the problem described above. Now, you might think that I might have different policy access to S3 between these 2 aws accounts. I already checked that, and both accounts have the "AmazonS3FullAccess" policy.
So, even if this is a problem on the AWS account setup, I would like to add to my script a line to set the file permissions back to the way it was before updating it. I think that should do the trick and allow me to download the file after running the Python script.
How can I set file permissions (not bucket) in S3 using Boto?
Set the policy, with one of the canned acl strings, something like:
k.set_contents_from_string(json.dumps(parsed_json, indent=4), policy=<ACL STRING>)