Deleting s3 file also deletes the folder python code - python

I am trying to delete all the files inside a folder in S3 Bucket using the below python code but the folder is also getting deleted along with the file
import boto3
s3 = boto3.resource('s3')
old_bucket_name='new-bucket'
old_prefix='data/'
old_bucket = s3.Bucket(old_bucket_name)
old_bucket.objects.filter(Prefix=old_prefix).delete()

S3 does not have folders. Object names can contain / and the console will represent objects with common prefixes that contain a / as folders, but the folder does not actually exist. If you're looking to have that visual representation, you can create a zero-length object that ends with a / which is basically equivalent to what the console does if you create a folder via the UI.
Relevant docs can be found here

Related

deleting a folder inside gcp bucket

I am having a temporary folder which I want to delete in gcp bucket I want to delete it with all its content, what I thought of is I can pass the path of this temp folder as a prefix and list all blobs inside it and delete every blob, but I had doubts that it would delete the blobs inside it without deleting the folder itself, is it right? The aim goal is to find folder1/ empty without the temp folder but, when I tried I found un expected behavior I found the folder that contains this temp folder is deleted !!
for example if we have folder1/tempfolder/file1.csv and folder1/tempfolder/file2.csv I found that folder1 is deleted after applying my changes, here are the changes applied:
delete_folder_on_uri(storage_client,"gs://bucketname/folder1/tempfolder/")
def delete_folder_on_uri(storage_client, folder_gcs_uri):
bucket_uri = BucketUri(folder_gcs_uri)
delete_folder(storage_client, bucket_uri.name, bucket_uri.key)
def delete_folder(storage_client, bucket_name, folder):
bucket = storage_client.get_bucket(bucket_name)
blobs = bucket.list_blobs(prefix=folder)
for blob in blobs:
blob.delete()
PS: BucketUri is a class in which bucket_uri.name retrieves the bucket name and bucket_uri.key retrieves the path which is folder1/tempfolder/
There is no such thing as folder in the GCS at all. The "folder" - is just a human friendly representation of an object's (or file's) prefix. So the full path looks like gs://bucket-name/prefix/another-prefix/object-name. As there are no folders (ontologically) - thus - there is nothing to delete.
Thus you might need to delete all objects (files) which start with a some prefix in a given bucket.
I think you are doing everything correctly.
Here is an old example (similar to your code) - How to delete GCS folder from Python?
And here is an API description - Blobs / Objects - you might like to check the delete method.

S3 boto3 delete files except a specific file

Probably this is just a newbie question.
I have a python code using boto3 sdk and i need to delete all files from an s3 bucket except one file.
The issue is that the user is updating this S3 bucket and places some files into some folders. After i copy those files i need to delete them, hence the issue here is that the folders are deleted as well, since there is no concept of folders on Cloud Providers. I need to keep the "folder" structure intact. I was thinking of placing a dummy file inside each "folder" and exclude that file from deletion.
Is this something doable?
If you create a zero-byte object with the same name as the folder you want ending in a /, it will show up as an empty folder in the AWS Console, and other tools that enumerate objects delimited by prefix will see the prefix in their list of common prefixes:
s3 = boto3.client('s3')
s3.put_object(Bucket='example-bucket', Key='example/folder/', Body=b'')
Then, as you enumerate the list of objects to delete them, ignore any object that ends in a /, since this will just be the markers you're using for folders:
s3 = boto3.client('s3')
resp = s3.list_objects_v2(Bucket='example-bucket', Prefix='example/folder/')
for cur in resp['Contents']:
if cur['Key'].endswith('/'):
print("Ignoring folder marker: " + cur['Key'])
else:
print("Should delete object: " + cur['Key'])
# TODO: Delete this object
for file in files_in_bucket:
if file.name != file_name_to_keep:
file.delete()
Could follow this sort of logic in your script?

Issue in list() method of module boto

I am using the list method as:
all_keys = self.s3_bucket.list(self.s3_path)
The bucket "s3_path" contains files and folders. The return value of above line is confusing. It is returning:
Parent directory
A few directories not all
All the files in folder and subfolders.
I had assumed it would return files only.
There is actually no such thing as a folder in Amazon S3. It is just provided for convenience. Objects can be stored in a given path even if a folder with that path does not exist. The Key of the object is the full path plus the filename.
For example, this will copy a file even if the folder does not exist:
aws s3 cp file.txt s3://my-bucket/foo/bar/file.txt
This will not create the /foo/bar folder. It simply creates an object with a Key of: /foo/bar/file.txt
However, if folders are created in the S3 Management Console, a zero-length file is created with the name of the folder so that it appears in the console. When listing files, this will appear as the name of the directory, but it is actually the name of a zero-length file.
That is why some directories might appear but not others -- it depends whether they were specifically created, or whether they objects were simply stored in that path.
Bottom line: Amazon S3 is an object storage system. It is really just a big Key/Value store -- the Key is the name of the Object, the Value is the contents of the object. Do not assume it works the same as a traditional file system.
If you have a lot of items in the bucket, the results of a list_objects will be paginated. By default, it will return up to 1000 items. See the Boto docs to learn how to use Marker to pagniate through all items.
Oh, looks like you're on Boto 2. For you, it will be BucketListResultSet.

Import specific file from an S3 subfolder into Python

I am using boto library to import data from S3 into python following instructions: http://boto.cloudhackers.com/en/latest/s3_tut.html
The following code allows me to import all files in main folder into python, but replacing c.get_bucket('mainfolder/subfolder') does not work. Does anybody knows how i can access a sub-folder and import its contents ?
import boto
c = boto.connect_s3()
b = c.get_bucket('mainfolder')
The get_bucket method on the connection returns a Bucket object. To access individual files or directories within that bucket, you need to create a Key object with the file path, or use Bucket.list_keys with a folder path to get all the keys for files under that path. Each Key object acts as a handle for a stored file. You then call functions on the keys to manipulate the files stored. For example:
import boto
connection = boto.connect_s3()
bucket = connection.get_bucket('myBucketName')
fileKey = bucket.get_key('myFileName.txt')
print fileKey.get_contents_as_string()
for key in bucket.list('myFolderName'):
print key.get_contents_as_string()
The example here simply prints out the contents of each file (which is probably a bad idea!). Depending on what you want to do with the files, you may want to download them to a temporary directory, or read them to a variable etc. See http://boto.cloudhackers.com/en/latest/ref/s3.html#module-boto.s3.key for the documentation on what can be done with keys.

How should i create the index of files using boto and S3 using python django

I have the folder structure with 2000 files on S3.
I want that every week i run the program that gets the lists of files are folders from s3 and populates the database.
Then i use that database to show same folder structure on the site.
I ahve two problems
How can i get the list of folders from there and then store them in mysql. Do i need to grab all the file names and then split with "/" . But it looks diffuclt to see which files belong to which folders. I have found this https://stackoverflow.com/a/17096755/1958218 but could not found where is listObjects() function
doesn't get_all_keys() method of the s3 bucket do what you need:
s3 = boto.connect_s3()
b = s3.get_bucket('bucketname')
keys = b.get_all_keys()
then iterate over keys, do os.path.split and unique...

Categories