Write to a specific folder in an S3 bucket using boto3 python

Write to a specific folder in an S3 bucket using boto3 python - python

The question
I have a file that I would like to write to a specific folder within my S3 bucket lets call the bucket bucket and the folder folderInBucket
I am using the boto3 library to achieve and have the following function:
What I have done
def upload_file(file_name, bucket, object_name = None):
if object_name is None:
object_name = file_name
s3 = b3.client('s3')
try:
response = s3.upload_file(file_name, bucket, Key='bulk_movies.json')
except ClientError as e:
print(e)
return False
print('Success!')
return True
upload_file('./s3/bulk_movies.json', 'bucket')
I have also tried when calling the function using bucket/folderInBucket as the second parameter but this produces an error in the code (sort of as expected actually)
Gaps in understanding
This function was more or less ripped from the boto3 documentation. The docs don't really specify how to write into a specific folder within our S3 bucket. I know for sure the file itself is able to write fine into the bucket's main directory because the code outlined above works without issue.

I was able to get this to work by modifying your function and the call a bit to include the object_name as the absolute path to the file on the bucket.
import boto3 as b3
def upload_file(file_name, bucket, object_name):
if object_name is None:
object_name = file_name
s3 = b3.client('s3')
try:
response = s3.upload_file(file_name, bucket, Key=object_name)
except ClientError as e:
print(e)
return False
print('Success!')
return True
upload_file('bulk_movies.json', '<bucket-name>', 'folderInBucket/bulk_movies.json')
Please share the error if you're still running into one. As far as the file upload is concerned, it should work with what you have already done.

S3 is "blob storage" and as such the concept of a "folder" doesn't really exist. When uploading a file you just provide the complete prefix to the destination string.
Here's an example:
import boto3
s3 = boto3.resource('s3')
folder_name = 'folder_name'
s3.meta.client.upload_file('helloworld.txt', 'bucketname', folder_name + '/helloworld.txt')

There is no such thing - "folder" is s3.
What do you consider as "folder" is part of the file name.
URL example: s3://mybucket/folder1/folder2/file.txt
In the example above the file name is 'folder1/folder2/file.txt'

buckets are not folders, but they act like folders. you store objects in a s3 bucket. An object has a key. The file is uploaded by default with private permissions. you can set the acl to public-read allowing everyone in the world to see the file.
s3.upload_file(
Bucket='gid-requests',
Filename='bulk_movies.json',
Key='folderInBucket/bulk_movies.json',
ExtraArgs={'ACL':'public-read'})

Related

Uploading Files to AWS S3 Bucket Folder in Python Causes Regex Error

I have an AWS S3 bucket called task-details and a folder called archive so the S3 URI is s3://task-details/archive/ & ARN of arn:aws:s3:::task-details/archive/. I'm trying to use the upload_file method from Python's boto3 package to upload a CSV file to a folder within the S3 bucket.
Below is the method I am using to try and upload data to the folder but I keep getting a regex error which makes me think that I can't upload data to a specific folder in a bucket. Does anyone know how to do this?
Method:
import logging
import boto3
from botocore.exceptions import ClientError
def upload_file(file_name, bucket, object_name=None):
"""Upload a file to an S3 bucket
:param file_name: File to upload
:param bucket: Bucket to upload to
:param object_name: S3 object name. If not specified then file_name is used
:return: True if file was uploaded, else False
"""
# If S3 object_name was not specified, use file_name
if object_name is None:
object_name = file_name
# Upload the file
s3_client = boto3.client('s3')
try:
response = s3_client.upload_file(file_name, bucket, object_name)
except ClientError as e:
logging.error(e)
return False
return True
My Code (I've also tried bucket = s3://task-details/archive/ and bucket = task-details/archive/):
upload_file(
file_name = filepath,
bucket = "arn:aws:s3:::task-details/archive/",
object_name = object_name
)
The Error:
Invalid bucket name "arn:aws:s3:::task-details/archive/": Bucket name must match the regex
"^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"

The API call wants the bucket name or ARN. Your bucket name is task-details and your bucket ARN is arn:aws:s3:::task-details.
You use the Key parameter when calling upload_file to specify the object's key, for example archive/cats/persian.png. Note that the S3 object key is not simply the object/file name but also includes the prefix/folder.

The problem was that I needed to add the folder path as part of the object_name instead of the bucket.
Fixed Code:
upload_file(
file_name = filepath,
bucket = "arn:aws:s3:::task-details",
object_name = "archive/" + object_name
)

In my case, the other answers didn't resolve the error. I had to change my bucket name from the arn ("arn:aws:s3:::bucket-name") to the bucket name alone ("bucket-name").
Adding a potentially helpful explanation from #jarmod:
Interesting that the current boto3 docs say it’s the bucket name but the error message in the original post above indicated that it could be an ARN. Personally I always use bucket name and maybe the error message above is a red herring. Certainly it seems that ARN is not supported.

I was trying to upload the contents of a zip file. Also I wanted to upload to a folder inside the bucket. So this is my working solution.
s3.upload_fileobj(
Bucket='user-data',
Key='my_extracted-files/' + file_name, # file_name is the name of file
Fileobj=zop # This is a file obj
)
My guess is the naming format of bucket and key params is the same for method: upload_file too
Note: Don't forget to add Put permissions in your policy

Download Entire Content of a subfolder in a S3 bucket

I have a bucket in s3 called "sample-data". Inside the Bucket I have folders labelled "A" to "Z".
Inside each alphabetical folder there are more files and folders. What is the fastest way to download the alphabetical folder and all it's content?
For example --> sample-data/a/foo.txt,more_files/foo1.txt
In the above example the bucket sample-data contains an folder called a which contains foo.txt and a folder called more_files which contains foo1.txt
I know how to download a single file. For instance if i wanted foo.txt I would do the following.
s3 = boto3.client('s3')
s3.download_file("sample-data", "a/foo.txt", "foo.txt")
However i am wondering if i can download the folder called a and all it's contents entirely? Any help would be appreciated.

I think your best bet would be the awscli
aws s3 cp --recursive s3://mybucket/your_folder_named_a path/to/your/destination
From the docs:
--recursive (boolean) Command is performed on all files or objects under the specified directory or prefix.
EDIT:
To do this with boto3 try this:
import os
import errno
import boto3
client = boto3.client('s3')
def assert_dir_exists(path):
try:
os.makedirs(path)
except OSError as e:
if e.errno != errno.EEXIST:
raise
def download_dir(bucket, path, target):
# Handle missing / at end of prefix
if not path.endswith('/'):
path += '/'
paginator = client.get_paginator('list_objects_v2')
for result in paginator.paginate(Bucket=bucket, Prefix=path):
# Download each file individually
for key in result['Contents']:
# Calculate relative path
rel_path = key['Key'][len(path):]
# Skip paths ending in /
if not key['Key'].endswith('/'):
local_file_path = os.path.join(target, rel_path)
# Make sure directories exist
local_file_dir = os.path.dirname(local_file_path)
assert_dir_exists(local_file_dir)
client.download_file(bucket, key['Key'], local_file_path)
download_dir('your_bucket', 'your_folder', 'destination')

You list all the objects in the folder you want to download. Then iterate file by file and download it.
import boto3
s3 = boto3.client("s3")
response = s3.list_objects_v2(
Bucket=BUCKET,
Prefix ='DIR1/DIR2',
)
The response is of type dict. The key that contains the list of the file names is "Contents"
Here are more information:
list all files in a bucket
boto3 documentation
I am not sure if this is the fastest solution, but it can help you.

Handling exception for S3 bucket fetch with boto3

I am doing something like below to get all the files inside my s3 bucket.
for obj in bucket.objects.filter(Delimiter='/', Prefix='uploads/{}/'.format(name)): # to get data from subfolder dir
if obj.key.endswith(('.xlsx', '.csv')):
paths.append(obj.key)
I need to handle a case where either there is no files inside the folder or the folder(uploads/{}/) itself doesn't exist. How do I handle this.

You can try something like:
s3 = boto3.resource('s3')
try:
for obj in bucket.objects.filter(Delimiter='/',
Prefix='uploads/{}/'.format(name)): # to get data from subfolder dir
if obj.key.endswith(('.xlsx', '.csv')):
paths.append(obj.key)
except s3.meta.client.exceptions.NoSuchKey:
print("no such key in bucket")

Remove absolute path while uploading file on S3

I am trying to upload file on S3 in my bucket, using following code which is working absolutely fine.
#!/usr/bin/python
import os
import boto
import boto.s3.connection
from boto.s3.key import Key
from boto.s3.connection import S3Connection
from datetime import datetime
try:
conn = boto.s3.connect_to_region('us-east-1',
aws_access_key_id = 'AKXXXXXXXXXXXXA',
aws_secret_access_key = 'cXXXXXXXXXXXXXXXXXXXXXXXXX2',
calling_format = boto.s3.connection.OrdinaryCallingFormat(),
)
print conn
filename = '/home/rahul/GitDjangopostgres/AWSNewView/company/ExtraPaymentDocuments/Argus_ATM_e_surveillance/Gujarat/Ahmedabad/P3DCAM04/cecb834f-ae85-49e3-b8a1-a6nC^U)3GcZ)M62d643aa7-d047-498c-bf59-8__Invoice (7).pdf'
bucket = conn.get_bucket('bucketName', validate=False)
key_name = filename
print "file to upload",key_name
secure_https_url = 'https://{host}/{bucket}{key}'.format(
host=conn.server_name(),
bucket='bucketName',
key=key_name)
print "secure_https_url",secure_https_url
k = bucket.new_key(key_name)
mp = k.set_contents_from_filename(key_name)
print "File uploaded successfully"
except Exception,e:
print str(e)
print "error"
Now the problem is as my file name is '/home/rahul/GitDjangopostgres/AWSNewView/company/ExtraPaymentDocuments/Argus_ATM_e_surveillance/Gujarat/Ahmedabad/P3DCAM04/cecb834f-ae85-49e3-b8a1-a6nC^U)3GcZ)M62d643aa7-d047-498c-bf59-8__Invoice (7).pdf', It's creating hierarchical bucket and storing my file. so I am getting file path as https://s3.amazonaws.com/bucketName/home/rahul/GitDjangopostgres/AWSNewView/company/ExtraPaymentDocuments/Argus_ATM_e_surveillance/Gujarat/Ahmedabad/P3DCAM04/cecb834f-ae85-49e3-b8a1-a6nC^U)3GcZ)M62d643aa7-d047-498c-bf59-8__Invoice (7).pdf.I want to change this hierarchy to https://s3.amazonaws.com/bucketName/ExtraPaymentDocuments/Argus_ATM_e_surveillance/Gujarat/Ahmedabad/P3DCAM04/cecb834f-ae85-49e3-b8a1-a6nC^U)3GcZ)M62d643aa7-d047-498c-bf59-8__Invoice (7).pdf. Is there any option to do it with boto OR should I go with python, because to upload file on s3 it requires absolute path of the file, As I am using Django and this is my celery task.

The function set_contents_from_filename(key_name) understand that whatever is key name it will put it in s3 as it is. If your key name contains any / then it will create hierarchy structure. For your situation i suggest that you create two paths. One is your base local path which contains your files. Base local path will contain all your files that you want to upload to s3 (use os.path.join for creating the path) Then next is aws path which is the hierarchy you want to create in your s3 bucket. As an example you can declare the aws path as:
/ExtraPaymentDocuments/Argus_ATM_e_surveillance/Gujarat/Ahmedabad/P3DCAM04
Then you append the local filename to the parameter key_name which you will pass to function new_key.
Boto will work fine here.
example:
create a s3 path that you want in your s3 storage and also add a base local path which contain all the files that you want to upload. filename has to be appended to create a file. new_key function will create a key i.e. path that you can use to store your files. set_contents_from_filename function will take a local file path and store this file to s3 with the key (path) provided in the function above.
k = bucket.new_key(s3_path + filename)
mp = k.set_contents_from_filename(base_local_path _ filename)

Download S3 Files with Boto

I am trying to set up an app where users can download their files stored in an S3 Bucket. I am able to set up my bucket, and get the correct file, but it won't download, giving me the this error: No such file or directory: 'media/user_1/imageName.jpg' Any idea why? This seems like a relatively easy problem, but I can't quite seem to get it. I can delete an image properly, so it is able to identify the correct image.
Here's my views.py
def download(request, project_id=None):
conn = S3Connection('AWS_BUCKET_KEY', 'AWS_SECRET_KEY')
b = Bucket(conn, 'BUCKET_NAME')
k = Key(b)
instance = get_object_or_404(Project, id=project_id)
k.key = 'media/'+str(instance.image)
k.get_contents_to_filename(str(k.key))
return redirect("/dashboard/")

The problem is that you are downloading to a local directory that doesn't exist (media/user1). You need to either:
Create the directory on the local machine first
Just use the filename rather than a full path
Use the full path, but replace slashes (/) with another character -- this will ensure uniqueness of filename without having to create directories
The last option could be achieved via:
k.get_contents_to_filename(str(k.key).replace('/', '_'))
See also: Boto3 to download all files from a S3 Bucket

Downloading files using boto3 is very simple, configure your AWS credentials at system level before using this code.
client = boto3.client('s3')
// if your bucket name is mybucket and the file path is test/abc.txt
// then the Bucket='mybucket' Prefix='test'
resp = client.list_objects_v2(Bucket="<your bucket name>", Prefix="<prefix of the s3 folder>")
for obj in resp['Contents']:
key = obj['Key']
//to read s3 file contents as String
response = client.get_object(Bucket="<your bucket name>",
Key=key)
print(response['Body'].read().decode('utf-8'))
//to download the file to local
client.download_file('<your bucket name>', key, key.replace('test',''))
replace is to locate the file in your local with s3 file name, if you don't replace it will try to save as 'test/abc.txt'.

import os
import boto3
import json
s3 = boto3.resource('s3', aws_access_key_id="AKIAxxxxxxxxxxxxJWB",
aws_secret_access_key="LV0+vsaxxxxxxxxxxxxxxxxxxxxxry0/LjxZkN")
my_bucket = s3.Bucket('s3testing')
# download file into current directory
for s3_object in my_bucket.objects.all():
# Need to split s3_object.key into path and file name, else it will give error file not found.
path, filename = os.path.split(s3_object.key)
my_bucket.download_file(s3_object.key, filename)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Write to a specific folder in an S3 bucket using boto3 python - python

There is no such thing - "folder" is s3. What do you consider as "folder" is part of the file name. URL example: s3://mybucket/folder1/folder2/file.txt In the example above the file name is 'folder1/folder2/file.txt'

Related

Uploading Files to AWS S3 Bucket Folder in Python Causes Regex Error

Download Entire Content of a subfolder in a S3 bucket

Handling exception for S3 bucket fetch with boto3

Remove absolute path while uploading file on S3

Download S3 Files with Boto

Categories

Resources