Boto3, s3 folder not getting deleted - python

I have a directory in my s3 bucket 'test', I want to delete this directory.
This is what I'm doing
s3 = boto3.resource('s3')
s3.Object(S3Bucket,'test').delete()
and getting response like this
{'ResponseMetadata': {'HTTPStatusCode': 204, 'HostId':
'************', 'RequestId': '**********'}}
but my directory is not getting deleted!
I tried with all combinations of '/test', 'test/' and '/test/' etc, also with a file inside that directory and with empty directory and all failed to delete 'test'.

delete_objects enables you to delete multiple objects from a bucket using a single HTTP request. You may specify up to 1000 keys.
https://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Bucket.delete_objects
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket')
objects_to_delete = []
for obj in bucket.objects.filter(Prefix='test/'):
objects_to_delete.append({'Key': obj.key})
bucket.delete_objects(
Delete={
'Objects': objects_to_delete
}
)

NOTE: See Daniel Levinson's answer for a more efficient way of deleting multiple objects.
In S3, there are no directories, only keys. If a key name contains a / such as prefix/my-key.txt, then the AWS console groups all the keys that share this prefix together for convenience.
To delete a "directory", you would have to find all the keys that whose names start with the directory name and delete each one individually. Fortunately, boto3 provides a filter function to return only the keys that start with a certain string. So you can do something like this:
s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket-name')
for obj in bucket.objects.filter(Prefix='test/'):
s3.Object(bucket.name, obj.key).delete()

Related

Copy a large amount of files in s3 on the same bucket

I got a "directory" on a s3 bucket with 80 TB ~ and I need do copy everything to another directory in the same bucket
source = s3://mybucket/abc/process/
destiny = s3://mybucket/cde/process/
I already tried to use aws s3 sync, but worked only for the big files, still left 50 TB to copy. I'm thinking about to use a boto3 code as this example below, but I don't know how to do for multiple files/directories recursively.
s3 = boto3.resource('s3')
copy_source = {
'Bucket': 'mybucket',
'Key': 'mykey'
}
s3.meta.client.copy(copy_source, 'otherbucket', 'otherkey')
How can I do this using the boto3?
While there may be better ways of doing this using bucket policies, it can be done using boto3.
first you will need to get a list of the contents of the bucket
bucket_items = s3_client.list_objects_v2(Bucket=source_bucket,Prefix=source_prefix)
bucket_contents = bucket_items.get('Contents',[])
Where source_bucket is the name of the bucket and source_prefix is the name of the folder.
Next you will iterate over the contents and for each item call the s3.meta.client.copy method like so
for content in bucket_contents:
copy_source = {
'Bucket': source_bucket,
'Key': content['Key']
}
s3.meta.client.copy(copy_source, source_bucket, destination_prefix + '/' + content['Key'].split('/')[-1])
the contents are a dictionary so you must use 'Key' to get the name of the item and use split to break it into prefix and file name.

AWS Lambda to delete everything from a specific folder in an S3 bucket

I'm trying to delete everything from a specific folder in an S3 bucket with AWS Lambda using Python. The Lambda runs successfully however, the files still exist in "folder1". There will be no sub-folder under this folder except files.
Could someone please assist? Here is the code:
import json
import os
import boto3
def lambda_handler(event,context):
s3 = boto3.resource('s3')
deletefile_bucket = s3.Bucket('test_bucket')
response = deletefile_bucket.delete_objects(
Delete={
'Objects': [
{
'Key': 'folder1/'
},
],
}
)
The delete_objects() command requires a list of object keys to delete. It does not perform wildcard operations and it does not delete the contents of subdirectories.
You will need to obtain a listing of all objects and then specifically request those objects to be deleted.
The delete_objects() command accepts up to 1000 objects to delete.

Delete files under S3 bucket recursively without deleting folders using python

I'm getting error, When i try to delete all files under specific folder
Problem is here ['Key': 'testpart1/.']
Also i would like to delete 30 days older file, please help me with script
import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my-bucket')
response = my_bucket.delete_objects(
Delete={
'Objects': [
{
'Key': 'testpart1/*.*' # the_name of_your_file
}
]
}
The code below will delete all files under the prefix recursively:
import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my-bucket')
response = my_bucket.objects.filter(Prefix="testpart1/").delete()
Please check https://stackoverflow.com/a/59146547/4214976 to filter out the object based on date.

How to list all the contents in a given key on Amazon S3 using Apache Libcloud?

The code to list contents in S3 using boto3 is known:
self.s3_client = boto3.client(
u's3',
aws_access_key_id=config.AWS_ACCESS_KEY_ID,
aws_secret_access_key=config.AWS_SECRET_ACCESS_KEY,
region_name=config.region_name,
config=Config(signature_version='s3v4')
)
versions = self.s3_client.list_objects(Bucket=self.bucket_name, Prefix=self.package_s3_version_key)
However, I need to list contents on S3 using libcloud. I could not find it in the documentation.
If you are just looking for all the contents for a specific bucket:
from libcloud.storage.types import Provider
from libcloud.storage.providers import get_driver
client = driver(StoreProvider.S3)
s3 = client(aws_id, aws_secret)
container = s3.get_container(container_name='name')
objects = s3.list_container_objects(container)
s3.download_object(objects[0], '/path/to/download')
The resulting objects will contain a list of all the keys in that bucket with filename, byte size, and metadata. To download call the download_object method on s3 with the full libcloud Object and your file path.
If you'd rather get all objects of all buckets, change get_container to list_containers with no parameters.
Information for all driver methods: https://libcloud.readthedocs.io/en/latest/storage/api.html
Short examples specific to s3: https://libcloud.readthedocs.io/en/latest/storage/drivers/s3.html

How to list S3 bucket Delimiter paths

How to list S3 bucket Delimiter paths?
Basically I want to list all of the "directories" and or "sub-directories" in a s3 bucket. I know these don't physically exist. Basically I want all the objects that contain the delimiter and then only return the key path before for the delimiter. Starting under a prefix would be even better but at the bucket level should be enough.
Example S3 Bucket:
root.json
/2018/cats/fluffy.png
/2018/cats/gary.png
/2018/dogs/rover.png
/2018/dogs/jax.png
I would like to then do something like:
s3_client = boto3.client('s3')
s3_client.list_objects(only_show_delimiter_paths=True)
Result
/2018/
/2018/cats/
/2018/dogs/
I don't see any way to do this natively using: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects
I could pull all the object names and do this in my application code but that seems inefficient.
The Amazon S3 page in boto3 has this example:
List top-level common prefixes in Amazon S3 bucket
This example shows how to list all of the top-level common prefixes in an Amazon S3 bucket:
import boto3
client = boto3.client('s3')
paginator = client.get_paginator('list_objects')
result = paginator.paginate(Bucket='my-bucket', Delimiter='/')
for prefix in result.search('CommonPrefixes'):
print(prefix.get('Prefix'))
But, it only shows top-level prefixes.
So, here's some code to print all the 'folders':
import boto3
client = boto3.client('s3')
objects = client.list_objects_v2(Bucket='my-bucket')
keys = [o['Key'] for o in objects['Contents']]
folders = {k[:k.rfind('/')+1] for k in keys if k.rfind('/') != -1}
print ('\n'.join(folders))

Categories