Delete subfolder within S3 bucket - python

I have a subfolder that I want to delete from an S3 bucket.
I want to delete the subfolder folder2
mybucket/folder1/folder2/folder3
The name of folder2 will vary, so I'm trying to loop through the objects in folder1 and somehow delete the second level folder. The current code I have works ONLY if the folder3 (subfolder of folder2) doesn't exist.
bucket = s3.Bucket(mybucket)
result = client.list_objects_v2(Bucket=mybucket, Delimiter='/', Prefix="folder1/")
for object in result.get('CommonPrefixes'):
subfolder = object.get('Prefix')
s3.Object(mybucket,subfolder).delete()

The thing you have to remember about Amazon S3 is that folders, in the sense you think of them, don't exist. They're not real. S3 is object storage, which means it thinks in objects, not files. The fact that the console renders things that look like filepaths with subfolders and so forth is just for convenience.
So instead of trying to delete a folder, you want to delete all files whose names begin with that prefix.

Related

How do I copy subfolders into another location

I'm making a program to back up files and folders to a destination.
The problem I'm currently facing is if I have a folder inside a folder and so on, with files in between them, I can't Sync them at the destination.
e.g.:
The source contains folder 1 and file 2. Folder 1 contains folder 2, folder 2 contains folder 3 and files etc...
The backup only contains folder 1 and file 2.
If the backup doesn't exist I simply use: shutil.copytree(path, path_backup), but in the case, I need to sync I can't get the files and folders or at least I'm not seeing a way to do it. I have walked the directory with for path, dir, files in os.walk(directory) and even used what someone suggest in another post:
def walk_folder(target_path, path_backup):
for files in os.scandir(target_path):
if os.path.isfile(files):
file_name = os.path.abspath(files)
print(file_name)
os.makedirs(path_backup)
elif os.path.isdir(files):
walk_folder(files, path_backup)
Is there a way to make the directories in the backup folder from the ground up and then add the info alongside or is the only way to just delete the whole folder and use shutil.copytree(path, path_backup).
With makedirs, all it does is say it can't create because the folder already exists, this is understandable as it's trying to write in the Source folder and not in the backup. Is there a way to make the path to replace Source for backup?
If any more code is needed feel free to ask!

deleting a folder inside gcp bucket

I am having a temporary folder which I want to delete in gcp bucket I want to delete it with all its content, what I thought of is I can pass the path of this temp folder as a prefix and list all blobs inside it and delete every blob, but I had doubts that it would delete the blobs inside it without deleting the folder itself, is it right? The aim goal is to find folder1/ empty without the temp folder but, when I tried I found un expected behavior I found the folder that contains this temp folder is deleted !!
for example if we have folder1/tempfolder/file1.csv and folder1/tempfolder/file2.csv I found that folder1 is deleted after applying my changes, here are the changes applied:
delete_folder_on_uri(storage_client,"gs://bucketname/folder1/tempfolder/")
def delete_folder_on_uri(storage_client, folder_gcs_uri):
bucket_uri = BucketUri(folder_gcs_uri)
delete_folder(storage_client, bucket_uri.name, bucket_uri.key)
def delete_folder(storage_client, bucket_name, folder):
bucket = storage_client.get_bucket(bucket_name)
blobs = bucket.list_blobs(prefix=folder)
for blob in blobs:
blob.delete()
PS: BucketUri is a class in which bucket_uri.name retrieves the bucket name and bucket_uri.key retrieves the path which is folder1/tempfolder/
There is no such thing as folder in the GCS at all. The "folder" - is just a human friendly representation of an object's (or file's) prefix. So the full path looks like gs://bucket-name/prefix/another-prefix/object-name. As there are no folders (ontologically) - thus - there is nothing to delete.
Thus you might need to delete all objects (files) which start with a some prefix in a given bucket.
I think you are doing everything correctly.
Here is an old example (similar to your code) - How to delete GCS folder from Python?
And here is an API description - Blobs / Objects - you might like to check the delete method.

Deleting s3 file also deletes the folder python code

I am trying to delete all the files inside a folder in S3 Bucket using the below python code but the folder is also getting deleted along with the file
import boto3
s3 = boto3.resource('s3')
old_bucket_name='new-bucket'
old_prefix='data/'
old_bucket = s3.Bucket(old_bucket_name)
old_bucket.objects.filter(Prefix=old_prefix).delete()
S3 does not have folders. Object names can contain / and the console will represent objects with common prefixes that contain a / as folders, but the folder does not actually exist. If you're looking to have that visual representation, you can create a zero-length object that ends with a / which is basically equivalent to what the console does if you create a folder via the UI.
Relevant docs can be found here

Issue in list() method of module boto

I am using the list method as:
all_keys = self.s3_bucket.list(self.s3_path)
The bucket "s3_path" contains files and folders. The return value of above line is confusing. It is returning:
Parent directory
A few directories not all
All the files in folder and subfolders.
I had assumed it would return files only.
There is actually no such thing as a folder in Amazon S3. It is just provided for convenience. Objects can be stored in a given path even if a folder with that path does not exist. The Key of the object is the full path plus the filename.
For example, this will copy a file even if the folder does not exist:
aws s3 cp file.txt s3://my-bucket/foo/bar/file.txt
This will not create the /foo/bar folder. It simply creates an object with a Key of: /foo/bar/file.txt
However, if folders are created in the S3 Management Console, a zero-length file is created with the name of the folder so that it appears in the console. When listing files, this will appear as the name of the directory, but it is actually the name of a zero-length file.
That is why some directories might appear but not others -- it depends whether they were specifically created, or whether they objects were simply stored in that path.
Bottom line: Amazon S3 is an object storage system. It is really just a big Key/Value store -- the Key is the name of the Object, the Value is the contents of the object. Do not assume it works the same as a traditional file system.
If you have a lot of items in the bucket, the results of a list_objects will be paginated. By default, it will return up to 1000 items. See the Boto docs to learn how to use Marker to pagniate through all items.
Oh, looks like you're on Boto 2. For you, it will be BucketListResultSet.

Can you list all folders in an S3 bucket?

I have a bucket containing a number of folders each folders contains a number of images. Is it possible to list all the folders without iterating through all keys (folders and images) in the bucket. I'm using Python and boto.
You can use list() with an empty prefix (first parameter) and a folder delimiter (second parameter) to achieve what you're asking for:
s3conn = boto.connect_s3(access_key, secret_key, security_token=token)
bucket = s3conn.get_bucket(bucket_name)
folders = bucket.list('', '/')
for folder in folders:
print folder.name
Remark:
In S3 there is no such thing as "folders". All you have is buckets and objects.
The objects represent files. When you name a file: name-of-folder/name-of-file it will look as if it's a file: name-of-file that resides inside folder: name-of-folder - but in reality there's no such thing as the "folder".
You can also use AWS CLI (Command Line Interface): the command s3ls <bucket-name> will list only the "folders" in the first-level of the bucket.
Yes ! You can list by using prefix and delimiters of a key. Have a look at the following documentation.
http://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysHierarchy.html

Categories