get just sub folder name in bucket s3 [duplicate] - python

I have boto code that collects S3 sub-folders in levelOne folder:
import boto
s3 = boto.connect_s3()
bucket = s3.get_bucket("MyBucket")
for level2 in bucket.list(prefix="levelOne/", delimiter="/"):
print(level2.name)
Please help to discover similar functionality in boto3. The code should not iterate through all S3 objects because the bucket has a very big number of objects.

If you are simply seeking a list of folders, then use CommonPrefixes returned when listing objects. Note that a Delimiter must be specified to obtain the CommonPrefixes:
import boto3
s3_client = boto3.client('s3')
response = s3_client.list_objects_v2(Bucket='BUCKET-NAME', Delimiter = '/')
for prefix in response['CommonPrefixes']:
print(prefix['Prefix'][:-1])
If your bucket has a HUGE number of folders and objects, you might consider using Amazon S3 Inventory, which can provide a daily or weekly CSV file listing all objects.

I think the following should be equivalent:
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('MyBucket')
for object in bucket.objects.filter(Prefix="levelOne/", Delimiter="/"):
print(object.key)

Related

boto3: How to Find Prefix of S3 Object and Update Object Given the Filename?

I have a populated S3 bucket and a filename I want to search for and update. The bucket has an existing list of objects (with prefixes). I am trying to make an API call to the bucket to update a given object. However, I need to search through all the objects in the bucket (including all prefixes), and then update it (via an s3 file upload) if it matches the current filename.
For example, my bucket has the following in root:
cat2.jpg
cat3.jpg
PRE CAT_PICS/cat.jpg
PRE CAT_PICS/cat2.jpg
PRE CAT_PICS/MORE_CAT_PICS/cat3.jpg
If I want to "recursively" search through all objects, match, and update cat3.jpg how could I do that? Additionally, is there a way to extract the prefix of the found object? I have some code already but I am not sure if it is correct. It also cannot cover more than 1000 objects since it lacks pagination:
My Current Code:
import boto3
FILE_TO_UPDATE = cat3.jpg
s3_bucket = "example_bucket2182023"
s3_client = boto3.client("s3")
for my_bucket_object in my_bucket.objects.all():
if my_bucket_object.endswith(FILE_TO_UPDATE):
try:
s3_client.put_object(
Bucket=s3_bucket,
key=my_bucket_object
)
except:
print(f"Error uploading {my_bucket_object} to {s3_bucket}")
print(my_bucket_object.key)
For pagination, you can use get_paginator() from boto3 S3 client. See reference here: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/paginators.html#customizing-page-iterators

Find all JSON files within S3 Bucket

is it possible to find all .json files within S3 bucket where the bucket itself can have multiple sub-directories ?
Actually my bucket includes multiple sub-directories where i would like to collect all JSON files inside it in order to iterate over them and parse specific key/values.
Here's the solution (uses the boto module):
import boto3
s3 = boto3.client('s3') # Create the connection to your bucket
objs = s3.list_objects_v2(Bucket='my-bucket')['Contents']
files = filter(lambda obj: obj['Key'].endswith('.json'), objs) # json only
return files
The syntax for the list_objects_v2 function in boto3 can be found here: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects_v2
Note that only the first 1000 keys are returned. To retrieve more than 1000 keys to fully exhaust the list, you can use the Paginator class.
s3 = boto3.client('s3') # Create the connection to your bucket
paginator = s3.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket='my-bucket')
files = []
for page in pages:
for obj in page['Contents']:
page_files = filter(lambda obj: obj['Key'].endswith('.json'), objs) # json only
files.extend(page_files)
return files
Note: I recommend using a function that uses yield to slowly iterate over the files instead of returning the whole list, especially if the number of json files in your bucket is extremely large.
Alternatively, you can also use the ContinuationToken parameter (check the boto3 reference linked above).

Include only .gz extension files from S3 bucket

I want to process/download .gz files from S3 bucket. There are more than 10,000 files on S3 so I am using
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket')
objects = bucket.objects.all()
for object in objects:
print(object.key)
This lists .txt files which I want to avoid. How can I do that?
The easiest way to filter objects by name or suffix is to do it within Python, such as using .endswith() to include/exclude objects.
You can Filter by Prefix, but not by suffix.

Download latest uploaded file from amazon s3 using boto3 in python

I have few csv files inside one of my buckets on amazon s3.
I need to download the latest uploaded csv file.
How to achieve this using boto3 in python??
Thanks.
S3 doesn't have an API for listing files ordered by date
However, if you indeed have only a few, you can list the files in the bucket and order them by last modification time.
bucketList = s3Client.list_objects(Bucket=<MyBucket>) # notice this is up to 1000 files
orderedList = sorted(bucketList, key=lambda k: k.last_modified)
lastUpdatedKey = orderedList[-1]
object = s3Client.get_object(Bucket=<MyBucket>, Key=lastUpdatedKey )

Create directories in Amazon S3 using python, boto3

I know S3 buckets not really have directories because the storage is flat. But it is possible to create directories programmaticaly with python/boto3, but I don't know how. I saw this on a documentary :
"Although S3 storage is flat: buckets contain keys, S3 lets you impose a directory tree structure on your bucket by using a delimiter in your keys.
For example, if you name a key ‘a/b/f’, and use ‘/’ as the delimiter, then S3 will consider that ‘a’ is a directory, ‘b’ is a sub-directory of ‘a’, and ‘f’ is a file in ‘b’."
I can create just files in the a S3 Bucket by :
self.client.put_object(Bucket=bucketname,Key=filename)
but I don't know how to create a directory.
Just a little modification in key name is required. self.client.put_object(Bucket=bucketname,Key=filename)
this should be changed to
self.client.put_object(Bucket=bucketname,Key=directoryname/filename)
Thats all.
If you read the API documentation You should be able to do this.
import boto3
s3 = boto3.client("s3")
BucketName = "mybucket"
myfilename = "myfile.dat"
KeyFileName = "/a/b/c/d/{fname}".format(fname=myfilename)
with open(myfilename) as f :
object_data = f.read()
client.put_object(Body=object_data, Bucket=BucketName, Key=KeyFileName)
Honestly, it is not a "real directory", but preformat string structure for organisation.
Adding forward slash / to the end of key name, to create directory didn't work for me:
client.put_object(Bucket="foo-bucket", Key="test-folder/")
You have to supply Body parameter in order to create directory:
client.put_object(Bucket='foo-bucket',Body='', Key='test-folder/')
Source: ryantuck in boto3 issue

Categories