Handling exception for S3 bucket fetch with boto3 - python

I am doing something like below to get all the files inside my s3 bucket.
for obj in bucket.objects.filter(Delimiter='/', Prefix='uploads/{}/'.format(name)): # to get data from subfolder dir
if obj.key.endswith(('.xlsx', '.csv')):
paths.append(obj.key)
I need to handle a case where either there is no files inside the folder or the folder(uploads/{}/) itself doesn't exist. How do I handle this.

You can try something like:
s3 = boto3.resource('s3')
try:
for obj in bucket.objects.filter(Delimiter='/',
Prefix='uploads/{}/'.format(name)): # to get data from subfolder dir
if obj.key.endswith(('.xlsx', '.csv')):
paths.append(obj.key)
except s3.meta.client.exceptions.NoSuchKey:
print("no such key in bucket")

Related

How can I get ONLY files from S3 with python aioboto3 or boto3?

I have this code and I want only paths that end to a file without intermediate empty folders. For example:
data/folder1/folder2
data/folder1/folder3/folder4/file1.txt
data/folder5/file2.txt
From those paths I only want:
data/folder1/folder3/folder4/file1.txt
data/folder5/file2.txt
I am using this code but it gives me paths that end to directories as well:
subfolders = set()
current_path = None
result = await self.s3_client.list_objects(Bucket=bucket, Prefix=prefix)
objects = result.get("Contents")
try:
for obj in objects:
current_path = os.path.dirname(obj["Key"])
if current_path not in subfolders:
subfolders.add(current_path)
except Exception as exc:
print(f"Getting objects with prefix: {prefix} failed")
raise exc
I would recommend using the boto3 Bucket resource here, because it simplifies pagination.
Here is an example of how to get a list of all files in an S3 bucket:
import boto3
bucket = boto3.resource("s3").Bucket("mybucket")
objects = bucket.objects.all()
files = [obj.key for obj in objects if not obj.key.endswith("/")]
print("Files:", files)
It's worth noting that getting a list of all folders and subfolders in an S3 bucket is a more difficult problem to solve, mainly because folders don't typically exist in S3. They are logically present, but not physically present, because of the presence of objects with a given hierarchical key such as dogs/small/corgi.png. For ideas, see retrieving subfolder names in S3 bucket.
Cant you check whether there is an extension or not?
By the way, you dont need to check existence of the path in the set since set will always keep the unique items.
list_objects does not return any indicator whether the item is folder or file. So, this looks the practical way.
Please check: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects
subfolders = set()
current_path = None
result = await self.s3_client.list_objects(Bucket=bucket, Prefix=prefix)
objects = result.get("Contents")
try:
for obj in objects:
current_path = os.path.dirname(obj["Key"])
if "." in current_path:
subfolders.add(current_path)
except Exception as exc:
print(f"Getting objects with prefix: {prefix} failed")
raise exc

Write to a specific folder in an S3 bucket using boto3 python

The question
I have a file that I would like to write to a specific folder within my S3 bucket lets call the bucket bucket and the folder folderInBucket
I am using the boto3 library to achieve and have the following function:
What I have done
def upload_file(file_name, bucket, object_name = None):
if object_name is None:
object_name = file_name
s3 = b3.client('s3')
try:
response = s3.upload_file(file_name, bucket, Key='bulk_movies.json')
except ClientError as e:
print(e)
return False
print('Success!')
return True
upload_file('./s3/bulk_movies.json', 'bucket')
I have also tried when calling the function using bucket/folderInBucket as the second parameter but this produces an error in the code (sort of as expected actually)
Gaps in understanding
This function was more or less ripped from the boto3 documentation. The docs don't really specify how to write into a specific folder within our S3 bucket. I know for sure the file itself is able to write fine into the bucket's main directory because the code outlined above works without issue.
I was able to get this to work by modifying your function and the call a bit to include the object_name as the absolute path to the file on the bucket.
import boto3 as b3
def upload_file(file_name, bucket, object_name):
if object_name is None:
object_name = file_name
s3 = b3.client('s3')
try:
response = s3.upload_file(file_name, bucket, Key=object_name)
except ClientError as e:
print(e)
return False
print('Success!')
return True
upload_file('bulk_movies.json', '<bucket-name>', 'folderInBucket/bulk_movies.json')
Please share the error if you're still running into one. As far as the file upload is concerned, it should work with what you have already done.
S3 is "blob storage" and as such the concept of a "folder" doesn't really exist. When uploading a file you just provide the complete prefix to the destination string.
Here's an example:
import boto3
s3 = boto3.resource('s3')
folder_name = 'folder_name'
s3.meta.client.upload_file('helloworld.txt', 'bucketname', folder_name + '/helloworld.txt')
There is no such thing - "folder" is s3.
What do you consider as "folder" is part of the file name.
URL example: s3://mybucket/folder1/folder2/file.txt
In the example above the file name is 'folder1/folder2/file.txt'
buckets are not folders, but they act like folders. you store objects in a s3 bucket. An object has a key. The file is uploaded by default with private permissions. you can set the acl to public-read allowing everyone in the world to see the file.
s3.upload_file(
Bucket='gid-requests',
Filename='bulk_movies.json',
Key='folderInBucket/bulk_movies.json',
ExtraArgs={'ACL':'public-read'})

Uploading Files to AWS S3 Bucket Folder in Python Causes Regex Error

I have an AWS S3 bucket called task-details and a folder called archive so the S3 URI is s3://task-details/archive/ & ARN of arn:aws:s3:::task-details/archive/. I'm trying to use the upload_file method from Python's boto3 package to upload a CSV file to a folder within the S3 bucket.
Below is the method I am using to try and upload data to the folder but I keep getting a regex error which makes me think that I can't upload data to a specific folder in a bucket. Does anyone know how to do this?
Method:
import logging
import boto3
from botocore.exceptions import ClientError
def upload_file(file_name, bucket, object_name=None):
"""Upload a file to an S3 bucket
:param file_name: File to upload
:param bucket: Bucket to upload to
:param object_name: S3 object name. If not specified then file_name is used
:return: True if file was uploaded, else False
"""
# If S3 object_name was not specified, use file_name
if object_name is None:
object_name = file_name
# Upload the file
s3_client = boto3.client('s3')
try:
response = s3_client.upload_file(file_name, bucket, object_name)
except ClientError as e:
logging.error(e)
return False
return True
My Code (I've also tried bucket = s3://task-details/archive/ and bucket = task-details/archive/):
upload_file(
file_name = filepath,
bucket = "arn:aws:s3:::task-details/archive/",
object_name = object_name
)
The Error:
Invalid bucket name "arn:aws:s3:::task-details/archive/": Bucket name must match the regex
"^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"
The API call wants the bucket name or ARN. Your bucket name is task-details and your bucket ARN is arn:aws:s3:::task-details.
You use the Key parameter when calling upload_file to specify the object's key, for example archive/cats/persian.png. Note that the S3 object key is not simply the object/file name but also includes the prefix/folder.
The problem was that I needed to add the folder path as part of the object_name instead of the bucket.
Fixed Code:
upload_file(
file_name = filepath,
bucket = "arn:aws:s3:::task-details",
object_name = "archive/" + object_name
)
In my case, the other answers didn't resolve the error. I had to change my bucket name from the arn ("arn:aws:s3:::bucket-name") to the bucket name alone ("bucket-name").
Adding a potentially helpful explanation from #jarmod:
Interesting that the current boto3 docs say it’s the bucket name but the error message in the original post above indicated that it could be an ARN. Personally I always use bucket name and maybe the error message above is a red herring. Certainly it seems that ARN is not supported.
I was trying to upload the contents of a zip file. Also I wanted to upload to a folder inside the bucket. So this is my working solution.
s3.upload_fileobj(
Bucket='user-data',
Key='my_extracted-files/' + file_name, # file_name is the name of file
Fileobj=zop # This is a file obj
)
My guess is the naming format of bucket and key params is the same for method: upload_file too
Note: Don't forget to add Put permissions in your policy

Download Entire Content of a subfolder in a S3 bucket

I have a bucket in s3 called "sample-data". Inside the Bucket I have folders labelled "A" to "Z".
Inside each alphabetical folder there are more files and folders. What is the fastest way to download the alphabetical folder and all it's content?
For example --> sample-data/a/foo.txt,more_files/foo1.txt
In the above example the bucket sample-data contains an folder called a which contains foo.txt and a folder called more_files which contains foo1.txt
I know how to download a single file. For instance if i wanted foo.txt I would do the following.
s3 = boto3.client('s3')
s3.download_file("sample-data", "a/foo.txt", "foo.txt")
However i am wondering if i can download the folder called a and all it's contents entirely? Any help would be appreciated.
I think your best bet would be the awscli
aws s3 cp --recursive s3://mybucket/your_folder_named_a path/to/your/destination
From the docs:
--recursive (boolean) Command is performed on all files or objects under the specified directory or prefix.
EDIT:
To do this with boto3 try this:
import os
import errno
import boto3
client = boto3.client('s3')
def assert_dir_exists(path):
try:
os.makedirs(path)
except OSError as e:
if e.errno != errno.EEXIST:
raise
def download_dir(bucket, path, target):
# Handle missing / at end of prefix
if not path.endswith('/'):
path += '/'
paginator = client.get_paginator('list_objects_v2')
for result in paginator.paginate(Bucket=bucket, Prefix=path):
# Download each file individually
for key in result['Contents']:
# Calculate relative path
rel_path = key['Key'][len(path):]
# Skip paths ending in /
if not key['Key'].endswith('/'):
local_file_path = os.path.join(target, rel_path)
# Make sure directories exist
local_file_dir = os.path.dirname(local_file_path)
assert_dir_exists(local_file_dir)
client.download_file(bucket, key['Key'], local_file_path)
download_dir('your_bucket', 'your_folder', 'destination')
You list all the objects in the folder you want to download. Then iterate file by file and download it.
import boto3
s3 = boto3.client("s3")
response = s3.list_objects_v2(
Bucket=BUCKET,
Prefix ='DIR1/DIR2',
)
The response is of type dict. The key that contains the list of the file names is "Contents"
Here are more information:
list all files in a bucket
boto3 documentation
I am not sure if this is the fastest solution, but it can help you.

S3 Delete files inside a folder using boto3

How can we delete files inside an S3 Folder using boto3?
P.S - Only files should be deleted, folder should remain.
You would have to use delete_object():
import boto3
s3_client = boto3.client('s3')
response = s3_client.delete_object(
Bucket='my-bucket',
Key='invoices/January.pdf'
)
If you are asking how to delete ALL files within a folder, then you would need to loop through all objects with a given Prefix:
import boto3
s3_client = boto3.client('s3')
BUCKET = 'my-bucket'
PREFIX = 'folder1/'
response = s3_client.list_objects_v2(Bucket=BUCKET, Prefix=PREFIX)
for object in response['Contents']:
print('Deleting', object['Key'])
s3_client.delete_object(Bucket=BUCKET, Key=object['Key'])
Also, please note that folders do not actually exist in Amazon S3. The Key (filename) of an object contains the full path of the object. If necessary, you can create a zero-length file with the name of a folder to make the folder 'appear', but this is not necessary. Merely creating a folder in a given path will make any subfolders sort of 'appear', but they will 'disappear' when the object is deleted (since folders don't actually exist).

Categories