Delete files under S3 bucket recursively without deleting folders using python - python

I'm getting error, When i try to delete all files under specific folder
Problem is here ['Key': 'testpart1/.']
Also i would like to delete 30 days older file, please help me with script
import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my-bucket')
response = my_bucket.delete_objects(
Delete={
'Objects': [
{
'Key': 'testpart1/*.*' # the_name of_your_file
}
]
}

The code below will delete all files under the prefix recursively:
import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my-bucket')
response = my_bucket.objects.filter(Prefix="testpart1/").delete()
Please check https://stackoverflow.com/a/59146547/4214976 to filter out the object based on date.

Related

How to delete file(s) from source s3 bucket after lambda successfully copies file(s) to destination s3 bucket?

I have the following the lambda function which uses an s3 trigger to copy files from a source to a destination bucket. This is working fine.
import os
import logging
import boto3
LOGGER = logging.getLogger()
LOGGER.setLevel(logging.INFO)
DST_BUCKET = os.environ.get('DST_BUCKET')
REGION = os.environ.get('REGION')
s3 = boto3.resource('s3', region_name=REGION)
def handler(event, context):
LOGGER.info('Event structure: %s', event)
LOGGER.info('DST_BUCKET: %s', DST_BUCKET)
for record in event['Records']:
src_bucket = record['s3']['bucket']['name']
src_key = record['s3']['object']['key']
copy_source = {
'Bucket': src_bucket,
'Key': src_key
}
LOGGER.info('copy_source: %s', copy_source)
bucket = s3.Bucket(DST_BUCKET)
bucket.copy(copy_source, src_key)
return {
'status': 'ok'
}
What I'm wanting to do now is to modify the code above to delete the file(s) (not the folder) from the source bucket after successful upload to the destination bucket.
Use case: user uploads three files, two legit csv file and one corrupted csv file. Lambda triggers on source bucket, begins the copying of those files. Lambda loops through the files outputting successful or true when done and false if there were issues along with the filename, then deletes those successfully uploaded from the source bucket.
I've tried various try/catch blocks for this but it ends up either deleting the entire folder or there's issues synchronizing the buckets where the file is deleted from the source folder before successful upload, etc.
I dont want to do away with the loop above so that if multiple files are uploaded it will loop through all of them and similarly delete all of them when successfully uploaded to the other bucket. Unsure if a simple boolean would be sufficient for this use case or another flag of some sort. The flag would have to keep track of the specific key, though, so that it knows which was successful and not.
Before removing the file from the source bucket, you can verify that it was uploaded correctly using s3.Object(DST_BUCKET, src_key).load():
import os
import logging
import boto3
LOGGER = logging.getLogger()
LOGGER.setLevel(logging.INFO)
DST_BUCKET = os.environ.get('DST_BUCKET')
REGION = os.environ.get('REGION')
s3 = boto3.resource('s3', region_name=REGION)
def handler(event, context):
LOGGER.info(f'Event structure: {event}')
LOGGER.info(f'DST_BUCKET: {DST_BUCKET}')
for record in event['Records']:
src_bucket = record['s3']['bucket']['name']
src_key = record['s3']['object']['key']
copy_source = {
'Bucket': src_bucket,
'Key': src_key
}
LOGGER.info(f'copy_source: {copy_source}')
bucket = s3.Bucket(DST_BUCKET)
bucket.copy(copy_source, src_key)
try:
#Check file
s3.Object(DST_BUCKET, src_key).load()
LOGGER.info(f"File {src_key} uploaded to Bucket {DST_BUCKET}")
# Delete the file from the source bucket
s3.Object(src_bucket, src_key).delete()
LOGGER.info(f"File {src_key} deleted from Bucket {src_bucket}")
except Exception as e:
return {"error":str(e)}
return {'status': 'ok'}
I've tested it with files in two different regions and worked great for me.

Copy a large amount of files in s3 on the same bucket

I got a "directory" on a s3 bucket with 80 TB ~ and I need do copy everything to another directory in the same bucket
source = s3://mybucket/abc/process/
destiny = s3://mybucket/cde/process/
I already tried to use aws s3 sync, but worked only for the big files, still left 50 TB to copy. I'm thinking about to use a boto3 code as this example below, but I don't know how to do for multiple files/directories recursively.
s3 = boto3.resource('s3')
copy_source = {
'Bucket': 'mybucket',
'Key': 'mykey'
}
s3.meta.client.copy(copy_source, 'otherbucket', 'otherkey')
How can I do this using the boto3?
While there may be better ways of doing this using bucket policies, it can be done using boto3.
first you will need to get a list of the contents of the bucket
bucket_items = s3_client.list_objects_v2(Bucket=source_bucket,Prefix=source_prefix)
bucket_contents = bucket_items.get('Contents',[])
Where source_bucket is the name of the bucket and source_prefix is the name of the folder.
Next you will iterate over the contents and for each item call the s3.meta.client.copy method like so
for content in bucket_contents:
copy_source = {
'Bucket': source_bucket,
'Key': content['Key']
}
s3.meta.client.copy(copy_source, source_bucket, destination_prefix + '/' + content['Key'].split('/')[-1])
the contents are a dictionary so you must use 'Key' to get the name of the item and use split to break it into prefix and file name.

AWS Lambda to delete everything from a specific folder in an S3 bucket

I'm trying to delete everything from a specific folder in an S3 bucket with AWS Lambda using Python. The Lambda runs successfully however, the files still exist in "folder1". There will be no sub-folder under this folder except files.
Could someone please assist? Here is the code:
import json
import os
import boto3
def lambda_handler(event,context):
s3 = boto3.resource('s3')
deletefile_bucket = s3.Bucket('test_bucket')
response = deletefile_bucket.delete_objects(
Delete={
'Objects': [
{
'Key': 'folder1/'
},
],
}
)
The delete_objects() command requires a list of object keys to delete. It does not perform wildcard operations and it does not delete the contents of subdirectories.
You will need to obtain a listing of all objects and then specifically request those objects to be deleted.
The delete_objects() command accepts up to 1000 objects to delete.

AWS Lambda - copy object to another S3 location

I'd like to write a lambda python code to move files to the same S3 bucket.
[same S3 bucket]
/location-as-is/file.jpg
[same S3 bucket]
/location-to-be/file.jpg
How can I do that?
Thank you.
In order to get this to work you will need a few things. First is the lambda code itself. You should be able to use the python sdk boto3 to make the call to copy. Here is an example how to copy your file:
import json
import boto3
s3 = boto3.resource('s3')
def lambda_handler(event, context):
my_bucket = "example-bucket"
current_object_key = "fileA/keyA.jpg"
new_object_key = "fileB/keyB.jpg"
copy_source = {
'Bucket': my_bucket,
'Key': current_object_key
}
s3.meta.client.copy(copy_source, my_bucket, new_object_key)
You will also need to make sure you lambda execution role has proper s3 read and write permissions and that your s3 bucket policy is configured to allow your lambda role to access it.
You can use boto for this purpose, as below:
import boto
c = boto.connect_s3()
src_buc = c.get_bucket('Source_Bucket')
sink_buc = c.get_bucket('Sink_Bucket')
and then you can iterate over all your keys to copy the content:
for k in src_buc.list():
# copy to sink
sink_buc.copy_key(k.key.name, src_buc.name, k.key.name)

Boto3, s3 folder not getting deleted

I have a directory in my s3 bucket 'test', I want to delete this directory.
This is what I'm doing
s3 = boto3.resource('s3')
s3.Object(S3Bucket,'test').delete()
and getting response like this
{'ResponseMetadata': {'HTTPStatusCode': 204, 'HostId':
'************', 'RequestId': '**********'}}
but my directory is not getting deleted!
I tried with all combinations of '/test', 'test/' and '/test/' etc, also with a file inside that directory and with empty directory and all failed to delete 'test'.
delete_objects enables you to delete multiple objects from a bucket using a single HTTP request. You may specify up to 1000 keys.
https://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Bucket.delete_objects
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket')
objects_to_delete = []
for obj in bucket.objects.filter(Prefix='test/'):
objects_to_delete.append({'Key': obj.key})
bucket.delete_objects(
Delete={
'Objects': objects_to_delete
}
)
NOTE: See Daniel Levinson's answer for a more efficient way of deleting multiple objects.
In S3, there are no directories, only keys. If a key name contains a / such as prefix/my-key.txt, then the AWS console groups all the keys that share this prefix together for convenience.
To delete a "directory", you would have to find all the keys that whose names start with the directory name and delete each one individually. Fortunately, boto3 provides a filter function to return only the keys that start with a certain string. So you can do something like this:
s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket-name')
for obj in bucket.objects.filter(Prefix='test/'):
s3.Object(bucket.name, obj.key).delete()

Categories