Create directories in Amazon S3 using python, boto3 - python

I know S3 buckets not really have directories because the storage is flat. But it is possible to create directories programmaticaly with python/boto3, but I don't know how. I saw this on a documentary :
"Although S3 storage is flat: buckets contain keys, S3 lets you impose a directory tree structure on your bucket by using a delimiter in your keys.
For example, if you name a key ‘a/b/f’, and use ‘/’ as the delimiter, then S3 will consider that ‘a’ is a directory, ‘b’ is a sub-directory of ‘a’, and ‘f’ is a file in ‘b’."
I can create just files in the a S3 Bucket by :
self.client.put_object(Bucket=bucketname,Key=filename)
but I don't know how to create a directory.

Just a little modification in key name is required. self.client.put_object(Bucket=bucketname,Key=filename)
this should be changed to
self.client.put_object(Bucket=bucketname,Key=directoryname/filename)
Thats all.

If you read the API documentation You should be able to do this.
import boto3
s3 = boto3.client("s3")
BucketName = "mybucket"
myfilename = "myfile.dat"
KeyFileName = "/a/b/c/d/{fname}".format(fname=myfilename)
with open(myfilename) as f :
object_data = f.read()
client.put_object(Body=object_data, Bucket=BucketName, Key=KeyFileName)
Honestly, it is not a "real directory", but preformat string structure for organisation.

Adding forward slash / to the end of key name, to create directory didn't work for me:
client.put_object(Bucket="foo-bucket", Key="test-folder/")
You have to supply Body parameter in order to create directory:
client.put_object(Bucket='foo-bucket',Body='', Key='test-folder/')
Source: ryantuck in boto3 issue

Related

get just sub folder name in bucket s3 [duplicate]

I have boto code that collects S3 sub-folders in levelOne folder:
import boto
s3 = boto.connect_s3()
bucket = s3.get_bucket("MyBucket")
for level2 in bucket.list(prefix="levelOne/", delimiter="/"):
print(level2.name)
Please help to discover similar functionality in boto3. The code should not iterate through all S3 objects because the bucket has a very big number of objects.
If you are simply seeking a list of folders, then use CommonPrefixes returned when listing objects. Note that a Delimiter must be specified to obtain the CommonPrefixes:
import boto3
s3_client = boto3.client('s3')
response = s3_client.list_objects_v2(Bucket='BUCKET-NAME', Delimiter = '/')
for prefix in response['CommonPrefixes']:
print(prefix['Prefix'][:-1])
If your bucket has a HUGE number of folders and objects, you might consider using Amazon S3 Inventory, which can provide a daily or weekly CSV file listing all objects.
I think the following should be equivalent:
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('MyBucket')
for object in bucket.objects.filter(Prefix="levelOne/", Delimiter="/"):
print(object.key)

S3 boto3 delete files except a specific file

Probably this is just a newbie question.
I have a python code using boto3 sdk and i need to delete all files from an s3 bucket except one file.
The issue is that the user is updating this S3 bucket and places some files into some folders. After i copy those files i need to delete them, hence the issue here is that the folders are deleted as well, since there is no concept of folders on Cloud Providers. I need to keep the "folder" structure intact. I was thinking of placing a dummy file inside each "folder" and exclude that file from deletion.
Is this something doable?
If you create a zero-byte object with the same name as the folder you want ending in a /, it will show up as an empty folder in the AWS Console, and other tools that enumerate objects delimited by prefix will see the prefix in their list of common prefixes:
s3 = boto3.client('s3')
s3.put_object(Bucket='example-bucket', Key='example/folder/', Body=b'')
Then, as you enumerate the list of objects to delete them, ignore any object that ends in a /, since this will just be the markers you're using for folders:
s3 = boto3.client('s3')
resp = s3.list_objects_v2(Bucket='example-bucket', Prefix='example/folder/')
for cur in resp['Contents']:
if cur['Key'].endswith('/'):
print("Ignoring folder marker: " + cur['Key'])
else:
print("Should delete object: " + cur['Key'])
# TODO: Delete this object
for file in files_in_bucket:
if file.name != file_name_to_keep:
file.delete()
Could follow this sort of logic in your script?

How to retrieve only the file name in a s3 folders path using pyspark

Hi I have aws s3 bucket in which few of the folders and subfolders are defined
I need to retrieve only the filename in whichever folder it will be. How to go about it
s3 bucket name - abc
path - s3://abc/ann/folder1/folder2/folder3/file1
path - s3://abc/ann/folder1/folder2/file2
code tried so far
s3 = boto3.client(s3)
lst_obj = s3.list_objects(bucket='abc',prefix='ann/')
lst_obj["contents"]
I'm further looping to get all the contents
for file in lst_obj["contents"]:
do somtheing...
Here file["Key"] gives me the whole path, but i just need the filename
Here is an example of how to get the filenames.
import boto3
s3 = boto3.resource('s3')
for obj in s3.Bucket(name='<your bucket>').objects.filter(Prefix='<prefix>'):
filename = obj.key.split('/')[-1]
print(filename)
You can just extract the name by splitting the file Key on / symbol and extracting last element
for file in lst_obj["contents"]:
name = file["Key"].split("/")[-1]
Using list objects even with a prefix is simply filtering objects that start with a specific prefix.
What you see as a path in S3 is actually part of the objects key, in fact the key (which is acting as a piece of metadata to identify the object) actually has the value including what might look as if they're subfolders.
If you want the last part of the object key, you will need to split the key by the separator ('/').
You could do this with file['Key'].rsplit(',')[1] which would give you the filename.

Can you list all folders in an S3 bucket?

I have a bucket containing a number of folders each folders contains a number of images. Is it possible to list all the folders without iterating through all keys (folders and images) in the bucket. I'm using Python and boto.
You can use list() with an empty prefix (first parameter) and a folder delimiter (second parameter) to achieve what you're asking for:
s3conn = boto.connect_s3(access_key, secret_key, security_token=token)
bucket = s3conn.get_bucket(bucket_name)
folders = bucket.list('', '/')
for folder in folders:
print folder.name
Remark:
In S3 there is no such thing as "folders". All you have is buckets and objects.
The objects represent files. When you name a file: name-of-folder/name-of-file it will look as if it's a file: name-of-file that resides inside folder: name-of-folder - but in reality there's no such thing as the "folder".
You can also use AWS CLI (Command Line Interface): the command s3ls <bucket-name> will list only the "folders" in the first-level of the bucket.
Yes ! You can list by using prefix and delimiters of a key. Have a look at the following documentation.
http://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysHierarchy.html

How should i create the index of files using boto and S3 using python django

I have the folder structure with 2000 files on S3.
I want that every week i run the program that gets the lists of files are folders from s3 and populates the database.
Then i use that database to show same folder structure on the site.
I ahve two problems
How can i get the list of folders from there and then store them in mysql. Do i need to grab all the file names and then split with "/" . But it looks diffuclt to see which files belong to which folders. I have found this https://stackoverflow.com/a/17096755/1958218 but could not found where is listObjects() function
doesn't get_all_keys() method of the s3 bucket do what you need:
s3 = boto.connect_s3()
b = s3.get_bucket('bucketname')
keys = b.get_all_keys()
then iterate over keys, do os.path.split and unique...

Categories