Download S3 Files with Boto

Download S3 Files with Boto - python

I am trying to set up an app where users can download their files stored in an S3 Bucket. I am able to set up my bucket, and get the correct file, but it won't download, giving me the this error: No such file or directory: 'media/user_1/imageName.jpg' Any idea why? This seems like a relatively easy problem, but I can't quite seem to get it. I can delete an image properly, so it is able to identify the correct image.
Here's my views.py
def download(request, project_id=None):
conn = S3Connection('AWS_BUCKET_KEY', 'AWS_SECRET_KEY')
b = Bucket(conn, 'BUCKET_NAME')
k = Key(b)
instance = get_object_or_404(Project, id=project_id)
k.key = 'media/'+str(instance.image)
k.get_contents_to_filename(str(k.key))
return redirect("/dashboard/")

The problem is that you are downloading to a local directory that doesn't exist (media/user1). You need to either:
Create the directory on the local machine first
Just use the filename rather than a full path
Use the full path, but replace slashes (/) with another character -- this will ensure uniqueness of filename without having to create directories
The last option could be achieved via:
k.get_contents_to_filename(str(k.key).replace('/', '_'))
See also: Boto3 to download all files from a S3 Bucket

Downloading files using boto3 is very simple, configure your AWS credentials at system level before using this code.
client = boto3.client('s3')
// if your bucket name is mybucket and the file path is test/abc.txt
// then the Bucket='mybucket' Prefix='test'
resp = client.list_objects_v2(Bucket="<your bucket name>", Prefix="<prefix of the s3 folder>")
for obj in resp['Contents']:
key = obj['Key']
//to read s3 file contents as String
response = client.get_object(Bucket="<your bucket name>",
Key=key)
print(response['Body'].read().decode('utf-8'))
//to download the file to local
client.download_file('<your bucket name>', key, key.replace('test',''))
replace is to locate the file in your local with s3 file name, if you don't replace it will try to save as 'test/abc.txt'.

import os
import boto3
import json
s3 = boto3.resource('s3', aws_access_key_id="AKIAxxxxxxxxxxxxJWB",
aws_secret_access_key="LV0+vsaxxxxxxxxxxxxxxxxxxxxxry0/LjxZkN")
my_bucket = s3.Bucket('s3testing')
# download file into current directory
for s3_object in my_bucket.objects.all():
# Need to split s3_object.key into path and file name, else it will give error file not found.
path, filename = os.path.split(s3_object.key)
my_bucket.download_file(s3_object.key, filename)

Related

Retrieve file from FTP, unzip file, save extract file to Amazon S3 bucket

Attempting to retrieve a file from FTP and save it to an S3 bucket within lambda function.
I can confirm the first part of the code works as I can see the list of files printed to Cloudwatch logs.
import ftplib
from ftplib import FTP
import zipfile
import boto3
s3 = boto3.client('s3')
S3_OUTPUT_BUCKETNAME = 'my-s3bucket'
ftp = FTP('ftp.godaddy.com')
ftp.login(user='auctions', passwd='')
ftp.retrlines('LIST')
The next part was resulting in the following error:
module initialization error: [Errno 30] Read-only file system: 'tdnam_all_listings.csv.zip'
However I managed to overcome this by adding 'tmp' to the file location as per following code:
fileName = 'all_expiring_auctions.json.zip'
with open('/tmp/fileName', 'wb') as file:
ftp.retrbinary('RETR ' + fileName, file.write)
Next, I am attempting to unzip the file from the temporary loaction
with zipfile.ZipFile('/tmp/fileName', 'r') as zip_ref:
zip_ref.extractall('')
Finally, I am attempting save the file to a particular 'folder' in the s3 bucket, as follows:
data = open('/tmp/all_expiring_auctions.json')
s3.Bucket('brnddmn-s3').upload_fileobj('data','my-s3bucket/folder/')
The code produces no errors that I can see in the log, however the unzipped file is not reaching the destination despite my efforts.
Any help greatly appreciated.

Firstly, you have to use the tmp directory for working with files in Lambda. The ZipFile extractall('') will create the extract in your current working directory though, assuming the zip content is a simple plain text file with no relative path. To create the extract in tmp directory, use
zip_ref.extract_all('tmp')
I'm not sure why there are no errors logged. data = open(...) should throw an error if no file is found. If required you can explicitly print if file exists:
import os
print(os.path.exists('tmp/all_expiring_auctions.json')) # True/False
Finally, once you have ensured the file exists, the argument for Bucket() should be the bucket name. Not sure if your bucket name is 'brnddmn-s3' or 'my-s3bucket'. Also, the first argument to upload_fileobj() should be a file object, i.e., data instead of string 'data'. The second argument should be the object key (filename in S3) instead of the folder name.
Putting it together, the last line should look like this.
S3_OUTPUT_BUCKETNAME = 'my-s3bucket' # Replace with your S3 bucket name
s3.Bucket(S3_OUTPUT_BUCKETNAME).upload_fileobj(data,'folder/all_expiring_auctions.json')

Download Entire Content of a subfolder in a S3 bucket

I have a bucket in s3 called "sample-data". Inside the Bucket I have folders labelled "A" to "Z".
Inside each alphabetical folder there are more files and folders. What is the fastest way to download the alphabetical folder and all it's content?
For example --> sample-data/a/foo.txt,more_files/foo1.txt
In the above example the bucket sample-data contains an folder called a which contains foo.txt and a folder called more_files which contains foo1.txt
I know how to download a single file. For instance if i wanted foo.txt I would do the following.
s3 = boto3.client('s3')
s3.download_file("sample-data", "a/foo.txt", "foo.txt")
However i am wondering if i can download the folder called a and all it's contents entirely? Any help would be appreciated.

I think your best bet would be the awscli
aws s3 cp --recursive s3://mybucket/your_folder_named_a path/to/your/destination
From the docs:
--recursive (boolean) Command is performed on all files or objects under the specified directory or prefix.
EDIT:
To do this with boto3 try this:
import os
import errno
import boto3
client = boto3.client('s3')
def assert_dir_exists(path):
try:
os.makedirs(path)
except OSError as e:
if e.errno != errno.EEXIST:
raise
def download_dir(bucket, path, target):
# Handle missing / at end of prefix
if not path.endswith('/'):
path += '/'
paginator = client.get_paginator('list_objects_v2')
for result in paginator.paginate(Bucket=bucket, Prefix=path):
# Download each file individually
for key in result['Contents']:
# Calculate relative path
rel_path = key['Key'][len(path):]
# Skip paths ending in /
if not key['Key'].endswith('/'):
local_file_path = os.path.join(target, rel_path)
# Make sure directories exist
local_file_dir = os.path.dirname(local_file_path)
assert_dir_exists(local_file_dir)
client.download_file(bucket, key['Key'], local_file_path)
download_dir('your_bucket', 'your_folder', 'destination')

You list all the objects in the folder you want to download. Then iterate file by file and download it.
import boto3
s3 = boto3.client("s3")
response = s3.list_objects_v2(
Bucket=BUCKET,
Prefix ='DIR1/DIR2',
)
The response is of type dict. The key that contains the list of the file names is "Contents"
Here are more information:
list all files in a bucket
boto3 documentation
I am not sure if this is the fastest solution, but it can help you.

S3 Delete files inside a folder using boto3

How can we delete files inside an S3 Folder using boto3?
P.S - Only files should be deleted, folder should remain.

You would have to use delete_object():
import boto3
s3_client = boto3.client('s3')
response = s3_client.delete_object(
Bucket='my-bucket',
Key='invoices/January.pdf'
)
If you are asking how to delete ALL files within a folder, then you would need to loop through all objects with a given Prefix:
import boto3
s3_client = boto3.client('s3')
BUCKET = 'my-bucket'
PREFIX = 'folder1/'
response = s3_client.list_objects_v2(Bucket=BUCKET, Prefix=PREFIX)
for object in response['Contents']:
print('Deleting', object['Key'])
s3_client.delete_object(Bucket=BUCKET, Key=object['Key'])
Also, please note that folders do not actually exist in Amazon S3. The Key (filename) of an object contains the full path of the object. If necessary, you can create a zero-length file with the name of a folder to make the folder 'appear', but this is not necessary. Merely creating a folder in a given path will make any subfolders sort of 'appear', but they will 'disappear' when the object is deleted (since folders don't actually exist).

Remove absolute path while uploading file on S3

I am trying to upload file on S3 in my bucket, using following code which is working absolutely fine.
#!/usr/bin/python
import os
import boto
import boto.s3.connection
from boto.s3.key import Key
from boto.s3.connection import S3Connection
from datetime import datetime
try:
conn = boto.s3.connect_to_region('us-east-1',
aws_access_key_id = 'AKXXXXXXXXXXXXA',
aws_secret_access_key = 'cXXXXXXXXXXXXXXXXXXXXXXXXX2',
calling_format = boto.s3.connection.OrdinaryCallingFormat(),
)
print conn
filename = '/home/rahul/GitDjangopostgres/AWSNewView/company/ExtraPaymentDocuments/Argus_ATM_e_surveillance/Gujarat/Ahmedabad/P3DCAM04/cecb834f-ae85-49e3-b8a1-a6nC^U)3GcZ)M62d643aa7-d047-498c-bf59-8__Invoice (7).pdf'
bucket = conn.get_bucket('bucketName', validate=False)
key_name = filename
print "file to upload",key_name
secure_https_url = 'https://{host}/{bucket}{key}'.format(
host=conn.server_name(),
bucket='bucketName',
key=key_name)
print "secure_https_url",secure_https_url
k = bucket.new_key(key_name)
mp = k.set_contents_from_filename(key_name)
print "File uploaded successfully"
except Exception,e:
print str(e)
print "error"
Now the problem is as my file name is '/home/rahul/GitDjangopostgres/AWSNewView/company/ExtraPaymentDocuments/Argus_ATM_e_surveillance/Gujarat/Ahmedabad/P3DCAM04/cecb834f-ae85-49e3-b8a1-a6nC^U)3GcZ)M62d643aa7-d047-498c-bf59-8__Invoice (7).pdf', It's creating hierarchical bucket and storing my file. so I am getting file path as https://s3.amazonaws.com/bucketName/home/rahul/GitDjangopostgres/AWSNewView/company/ExtraPaymentDocuments/Argus_ATM_e_surveillance/Gujarat/Ahmedabad/P3DCAM04/cecb834f-ae85-49e3-b8a1-a6nC^U)3GcZ)M62d643aa7-d047-498c-bf59-8__Invoice (7).pdf.I want to change this hierarchy to https://s3.amazonaws.com/bucketName/ExtraPaymentDocuments/Argus_ATM_e_surveillance/Gujarat/Ahmedabad/P3DCAM04/cecb834f-ae85-49e3-b8a1-a6nC^U)3GcZ)M62d643aa7-d047-498c-bf59-8__Invoice (7).pdf. Is there any option to do it with boto OR should I go with python, because to upload file on s3 it requires absolute path of the file, As I am using Django and this is my celery task.

The function set_contents_from_filename(key_name) understand that whatever is key name it will put it in s3 as it is. If your key name contains any / then it will create hierarchy structure. For your situation i suggest that you create two paths. One is your base local path which contains your files. Base local path will contain all your files that you want to upload to s3 (use os.path.join for creating the path) Then next is aws path which is the hierarchy you want to create in your s3 bucket. As an example you can declare the aws path as:
/ExtraPaymentDocuments/Argus_ATM_e_surveillance/Gujarat/Ahmedabad/P3DCAM04
Then you append the local filename to the parameter key_name which you will pass to function new_key.
Boto will work fine here.
example:
create a s3 path that you want in your s3 storage and also add a base local path which contain all the files that you want to upload. filename has to be appended to create a file. new_key function will create a key i.e. path that you can use to store your files. set_contents_from_filename function will take a local file path and store this file to s3 with the key (path) provided in the function above.
k = bucket.new_key(s3_path + filename)
mp = k.set_contents_from_filename(base_local_path _ filename)

Uploading a File to GCS using boto in python

I'm trying to upload files from my local system to the GCS using boto in cloud. After a file get uploaded I get an error which says " The MD5 you specified in Content-MD5 or x-goog-hash did not match what we computed." Below is my code.
def upload():
bucket_name = 'bucketname'
bucket = conn.get_bucket(bucket_name)
fpic = Key(bucket)
d='E:/Eclipse/workspace/Files'
for filename in os.listdir(d):
contents=d + '/' + filename
fpic.key = 'my-files'+filename
fpic.set_contents_from_filename(contents, {}, replace = True)

There is also another way to upload the files from local to GCS using boto. Find the link below [1]. Try that it will work for you without any error.
[1] https://cloud.google.com/storage/docs/gspythonlibrary#credentials

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Download S3 Files with Boto - python

Related

Retrieve file from FTP, unzip file, save extract file to Amazon S3 bucket

Download Entire Content of a subfolder in a S3 bucket

S3 Delete files inside a folder using boto3

Remove absolute path while uploading file on S3

Uploading a File to GCS using boto in python

Categories

Resources