File Migration from EC2 to S3 - python

We are currently creating a website that is kind of an upgrade to an old existing one. We would like to keep the old posts (that include images) in the new website. The old files are kept in an ec2 instance while the new website is serverless and keeps all it's files in s3.
My question, is there any way I could transfer the old files (from ec2) to the new s3 bucket using Python. I would like rename and relocate the files in the new filename/filepathing pattern that we devs decided.

There is boto3, python aws toolkit.
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-uploading-files.html
import logging
import boto3
from botocore.exceptions import ClientError
def upload_file(file_name, bucket, object_name=None):
"""Upload a file to an S3 bucket
:param file_name: File to upload
:param bucket: Bucket to upload to
:param object_name: S3 object name. If not specified then file_name is used
:return: True if file was uploaded, else False
"""
# If S3 object_name was not specified, use file_name
if object_name is None:
object_name = file_name
# Upload the file
s3_client = boto3.client('s3')
try:
response = s3_client.upload_file(file_name, bucket, object_name)
except ClientError as e:
logging.error(e)
return False
return True
You can write script with S3 upload_file function, then run on your ec2 locally.

Related

Google cloud storage API not uploading blob to bucket

I have adhered to google cloud storage documentation for getting a local file on a server uploaded as a blob to a bucket I created yet it does not work. It does work manually via the cloud console gui however. Note that my code for downloading an export generated object from a bucket in a cloudstoragesink works which does also call the cloud storage API, but the upload does not. I have tried to get some errors info for debugging on stdout or stderr to no avail. My credentials/authentication all checks out as I am able to use them for downloading. Also, the bucket I am trying to upload to has already been created in another function call, could it be that the storage client bucket method is attempting to instantiate an already existing bucket, thus not getting past that and erroring out before going to the upload blob method ? Any insight is appreciated.
Py reference for blob upload from filename function
GCS api guide for uploading objects to bucket
def upload_blob(bucket_name, source_file_name, destination_blob_name):
"""Uploads a file to the bucket."""
# The ID of your GCS bucket
# bucket_name = "your-bucket-name"
# The path to your file to upload
# source_file_name = "local/path/to/file"
# The ID of your GCS object
# destination_blob_name = "storage-object-name"
storage_client = storage.Client.from_service_account_json('/opt/gws/creds/gws-vault-data-export-ops-bfd51297e810.json')
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
print(
"File {} uploaded to {}.".format(
source_file_name, destination_blob_name
)
)
print('\n')
print('************List Blobs***********')
blobs_list = list_blobs(bucket_name_targ)
print('\n')
print(blobs_list)
print('\n')
blobs_list_set = set(map(lambda blob: blob.name, blobs_list))
print(blobs_list_set)
print('\n')
dl_dir = export['name']
blob_file = export_filename_dl
source_file_name = os.path.join(working_dir_param, dl_dir, blob_file)
print(source_file_name)
destination_blob_name = blob_file + 'api-upload'
upload_blob(bucket_name_targ, source_file_name, destination_blob_name)
#if blob_file not in blobs_list_set:
#print('AT UPLOAD CONDITION BLOCK')
#upload_blob(bucket_name_targ, source_file_name, destination_blob_name)
#else:
#print('Upload did not succeed')
I got it working. Turns out, since I am using a custom ansible module with field parameters including one for Linux pwd for the downloads of the exports which the upload function gets passed the string path for, I had to correctly combine the custom module parameter with the downloaded directory/file path to pass into source_file_name . Like so:
"pwd": {"default": None, "type": "str"}
working_dir_param = module.params["pwd"]
path_tail_actual = directory_main + filename_main
source_file_name_actual = working_dir_param + path_tail_actual

gzip an image file through cloud function on storage trigger

I am trying to automate gzipping on image uploaded to cloud storage bucket.
Everytime i upload an image, i want cloud function to run python code to convert it into gzip and move it to another bucket present in storage.
My code is not running. It's giving me File not found error. Also what's the right way to give full location.
My code is ..
from google.cloud import storage
import gzip
import shutil
client = storage.Client()
def hello_gcs(event, context):
"""Triggered by a change to a Cloud Storage bucket.
Args:
event (dict): Event payload.
context (google.cloud.functions.Context): Metadata for
the event.
"""
with open("/"+event['name'],'rb') as f_input :
with gzip.open("'/tmp/'+event['name']+'.gz'","wb") as f_output:
shutil.copyfileobj(f_input,f_output)
source_bucket= client.bucket(event['bucket'])
source_blob = source_bucket.blob("/tmp/" + event['name'])
destination_bucket = client.bucket('baalti2')
blob_copy = source_bucket.copy_blob(
source_blob, destination_bucket, event['name']
)
print(
"Blob {} in bucket {} moved to blob {} in bucket {}.".format(
source_blob.name,
source_bucket.name,
blob_copy.name,
destination_bucket.name,
)
)
You are using Linux file system APIs (open, shutil.copy) to access Cloud Storge which will not work. Copy the file from the bucket to local storage. Gzip the file. Copy the gzip file to the destination bucket. Use the Cloud Storage APIs to interact with Cloud Storage.

fetch the latest file in a folder and upload to s3?

filename variable is used to get the name of latest file
My aim is to monitor a folder and whenever new file is retrieved, automatically upload it to s3 bucket using boto3.
import time
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
from subprocess
import call
import os
import boto3
session = boto3.Session(aws_access_key_id='aws_access_key_id', aws_secret_access_key='aws_secret_access_key',
region_name='region_name')
s3 = session.client('s3')
class Watcher:
def __init__(self):
self.dir = os.path.abspath('D:\\project')
self.observer = Observer()
def run(self):
event_handler = Handler()
self.observer.schedule(event_handler, self.dir, recursive=True)
self.observer.start()
try:
while True:
time.sleep(5)
except:
self.observer.stop()
print("Error")
self.observer.join()
class Handler(FileSystemEventHandler):
#staticmethod
def on_any_event(event):
if event.is_directory:
return None
elif event.event_type == 'created':
print("Received created event - %s." % event.src_path)
s3.upload_file(Filename=event.src_path, bucket='bucketname, key=test-file-1)
if __name__ == '__main__':
w = Watcher()
w.run()
FileNotFoundError: [WinError 2] The system cannot find the file specified
As #alexhall mentioned in the comment, s3.meta.client.upload_file method will upload a file. You can read about boto3 s3 client's upload method documentation here: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.upload_file. However, it is a bit silly example they have there, as they are first creating an s3 resource rather than s3 client, and then because s3 resource does not actually have a method to upload file, they revert back to s3 client. You might as well directly create and use s3 client for uploads.
You are also relying on the fact that boto3 uses default session when you create the s3 resource like you did:
boto3.resource('s3')
This would work fine if you are running the code on lambda or if you are in an ec2 instance that has an IAM role configured for it to access s3 but I think you are running this outside AWS, in which case, you can have a boto3.Session() created first using your credentials, and then a client (or resource) can use that session.
aws_access_key_id = '<AWS_ACCESS_KEY_ID>'
aws_secret_access_key = '<AWS_SECRET_ACCESS_KEY>'
region_name = 'us-east-1'
session = boto3.Session(aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
region_name=region_name)
s3 = session.client('s3')
You can read about Session configuration here: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
As mentioned above, because you are trying to upload file, and you do not seem to do anything else with it, you may as well directly create an s3 client rather than s3 resource like you did and then get the s3 client using 'meta.client'.
instead of command = ... line, you simply use:
s3.upload_file(Filename, Bucket = 'aaaaa', Key='test-file-1')
You can delete the last line. You would 'call' if you you were running an OS/System command rather than a something within python.
Not sure if you are doing this to learn python (boto3). If so, congrats.
If not, AWS already provided such feature. So you keep everything in your code, but shell out to the AWS CLI.

Download specific files from AWS S3 bucket using boto3

I am trying to download to a local machine specific files from an S3 bucket.
The Bucket structure is as follow:
BucketName/TT/2019/07/23/files.pdf
I want to download all files under:
BucketName/TT/2019/07/23
How can this be done?
Please try this:
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('BucketName')
for obj in bucket.objects.filter(Prefix='TT/2019/07/23/'):
filename = obj.key.split("/").pop()
if filename != "":
print('Downloading ', obj.key)
bucket.download_file(obj.key, filename)
Note that you will need to configure aws first by setting up authentication credentials. Please refer to the quick start guide to see how to do that.

Store uploaded image to AWS S3

I have a server application written in python/django (REST api) for accepting a file upload from the client application. I want this uploaded file to be stored in AWS S3. I also want the file to be uploaded from client as multipart form / data . How can i achieve this. Any sample code application will help me to understand the way it should be done. Please assist.
class FileUploadView(APIView):
parser_classes = (FileUploadParser,)
def put(self, request, filename, format=None):
file_obj = request.data['file']
self.handle_uploaded_file(file_obj)
return self.get_response("", True, "", {})
def handle_uploaded_file(self, f):
destination = open('<path>', 'wb+')
for chunk in f.chunks():
destination.write(chunk)
destination.close()
Thanks in advance
If you want to your uploads to go directly to AWS S3, you can use django-storages and set your Django file storage backend to use AWS S3.
django-storages
django-storages documentation
This will allow your Django project to handle storage transparently to S3 without your having to manually re-upload your uploaded files to S3.
Storage Settings
You will need to add at least these configurations to your Django settings:
# default remote file storage
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
# aws access keys
AWS_ACCESS_KEY_ID = 'YOUR-ACCESS-KEY'
AWS_SECRET_ACCESS_KEY = 'YOUR-SECRET-ACCESS-KEY'
AWS_BUCKET_NAME = 'your-bucket-name'
AWS_STORAGE_BUCKET_NAME = AWS_BUCKET_NAME
Example Code to Store Upload to Remote Storage
This is a modified version of your view with a the handle_uploaded_file method using Django's storage backend to save the uploade file to the remote destination (using django-storages).
Note: Be sure to define the DEFAULT_FILE_STORAGE and AWS keys in your settings so django-storage can access your bucket.
from django.core.files.storage import default_storage
from django.core.files import File
# set file i/o chunk size to maximize throughput
FILE_IO_CHUNK_SIZE = 128 * 2**10
class FileUploadView(APIView):
parser_classes = (FileUploadParser,)
def put(self, request, filename, format=None):
file_obj = request.data['file']
self.handle_uploaded_file(file_obj)
return self.get_response("", True, "", {})
def handle_uploaded_file(self, f):
"""
Write uploaded file to destination using default storage.
"""
# set storage object to use Django's default storage
storage = default_storage
# set the relative path inside your bucket where you want the upload
# to end up
fkey = 'sub-path-in-your-bucket-to-store-the-file'
# determine mime type -- you may want to parse the upload header
# to find out the exact MIME type of the upload file.
content_type = 'image/jpeg'
# write file to remote server
# * "file" is a File storage object that will use your
# storage backend (in this case, remote storage to AWS S3)
# * "media" is a File object created with your upload file
file = storage.open(fkey, 'w')
storage.headers.update({"Content-Type": content_type})
f = open(path, 'rb')
media = File(f)
for chunk in media.chunks(chunk_size=FILE_IO_CHUNK_SIZE):
file.write(chunk)
file.close()
media.close()
f.close()
See more explanation and examples on how to access the remote storage here:
django-storages: Amazon S3
Take a look at boto package which provides AWS APIs:
from boto.s3.connection import S3Connection
s3 = S3Connection(access_key, secret_key)
b = s3.get_bucket('<bucket>')
mp = b.initiate_multipart_upload('<object>')
for i in range(1, <parts>+1):
io = <receive-image-part> # E.g. StringIO
mp.upload_part_from_file(io, part_num=i)
mp.complete_upload()

Categories