Include only .gz extension files from S3 bucket - python

I want to process/download .gz files from S3 bucket. There are more than 10,000 files on S3 so I am using
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket')
objects = bucket.objects.all()
for object in objects:
print(object.key)
This lists .txt files which I want to avoid. How can I do that?

The easiest way to filter objects by name or suffix is to do it within Python, such as using .endswith() to include/exclude objects.
You can Filter by Prefix, but not by suffix.

Related

get just sub folder name in bucket s3 [duplicate]

I have boto code that collects S3 sub-folders in levelOne folder:
import boto
s3 = boto.connect_s3()
bucket = s3.get_bucket("MyBucket")
for level2 in bucket.list(prefix="levelOne/", delimiter="/"):
print(level2.name)
Please help to discover similar functionality in boto3. The code should not iterate through all S3 objects because the bucket has a very big number of objects.
If you are simply seeking a list of folders, then use CommonPrefixes returned when listing objects. Note that a Delimiter must be specified to obtain the CommonPrefixes:
import boto3
s3_client = boto3.client('s3')
response = s3_client.list_objects_v2(Bucket='BUCKET-NAME', Delimiter = '/')
for prefix in response['CommonPrefixes']:
print(prefix['Prefix'][:-1])
If your bucket has a HUGE number of folders and objects, you might consider using Amazon S3 Inventory, which can provide a daily or weekly CSV file listing all objects.
I think the following should be equivalent:
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('MyBucket')
for object in bucket.objects.filter(Prefix="levelOne/", Delimiter="/"):
print(object.key)

How to upload an empty folder to S3 using Boto3?

My program backs up all the files in the directory, except for empty folders. How do you upload an empty folder into S3 using Boto 3, Python?
for dirName, subdirList, fileList in os.walk(path):
# for each directory, walk through all files
for fname in fileList:
current_key = dirName[dir_str_index:] +"\\"+ fname
current_key = current_key.replace("\\", "/")
S3 doesn't really have folders:
Amazon S3 has a flat structure instead of a hierarchy like you would see in a file system. However, for the sake of organizational simplicity, the Amazon S3 console supports the folder concept as a means of grouping objects. Amazon S3 does this by using a shared name prefix for objects (that is, objects that have names that begin with a common string).
Since folders are just part of object names, you can't have empty folders in S3.

Download latest uploaded file from amazon s3 using boto3 in python

I have few csv files inside one of my buckets on amazon s3.
I need to download the latest uploaded csv file.
How to achieve this using boto3 in python??
Thanks.
S3 doesn't have an API for listing files ordered by date
However, if you indeed have only a few, you can list the files in the bucket and order them by last modification time.
bucketList = s3Client.list_objects(Bucket=<MyBucket>) # notice this is up to 1000 files
orderedList = sorted(bucketList, key=lambda k: k.last_modified)
lastUpdatedKey = orderedList[-1]
object = s3Client.get_object(Bucket=<MyBucket>, Key=lastUpdatedKey )

Create directories in Amazon S3 using python, boto3

I know S3 buckets not really have directories because the storage is flat. But it is possible to create directories programmaticaly with python/boto3, but I don't know how. I saw this on a documentary :
"Although S3 storage is flat: buckets contain keys, S3 lets you impose a directory tree structure on your bucket by using a delimiter in your keys.
For example, if you name a key ‘a/b/f’, and use ‘/’ as the delimiter, then S3 will consider that ‘a’ is a directory, ‘b’ is a sub-directory of ‘a’, and ‘f’ is a file in ‘b’."
I can create just files in the a S3 Bucket by :
self.client.put_object(Bucket=bucketname,Key=filename)
but I don't know how to create a directory.
Just a little modification in key name is required. self.client.put_object(Bucket=bucketname,Key=filename)
this should be changed to
self.client.put_object(Bucket=bucketname,Key=directoryname/filename)
Thats all.
If you read the API documentation You should be able to do this.
import boto3
s3 = boto3.client("s3")
BucketName = "mybucket"
myfilename = "myfile.dat"
KeyFileName = "/a/b/c/d/{fname}".format(fname=myfilename)
with open(myfilename) as f :
object_data = f.read()
client.put_object(Body=object_data, Bucket=BucketName, Key=KeyFileName)
Honestly, it is not a "real directory", but preformat string structure for organisation.
Adding forward slash / to the end of key name, to create directory didn't work for me:
client.put_object(Bucket="foo-bucket", Key="test-folder/")
You have to supply Body parameter in order to create directory:
client.put_object(Bucket='foo-bucket',Body='', Key='test-folder/')
Source: ryantuck in boto3 issue

How should i create the index of files using boto and S3 using python django

I have the folder structure with 2000 files on S3.
I want that every week i run the program that gets the lists of files are folders from s3 and populates the database.
Then i use that database to show same folder structure on the site.
I ahve two problems
How can i get the list of folders from there and then store them in mysql. Do i need to grab all the file names and then split with "/" . But it looks diffuclt to see which files belong to which folders. I have found this https://stackoverflow.com/a/17096755/1958218 but could not found where is listObjects() function
doesn't get_all_keys() method of the s3 bucket do what you need:
s3 = boto.connect_s3()
b = s3.get_bucket('bucketname')
keys = b.get_all_keys()
then iterate over keys, do os.path.split and unique...

Categories