I have a server application written in python/django (REST api) for accepting a file upload from the client application. I want this uploaded file to be stored in AWS S3. I also want the file to be uploaded from client as multipart form / data . How can i achieve this. Any sample code application will help me to understand the way it should be done. Please assist.
class FileUploadView(APIView):
parser_classes = (FileUploadParser,)
def put(self, request, filename, format=None):
file_obj = request.data['file']
self.handle_uploaded_file(file_obj)
return self.get_response("", True, "", {})
def handle_uploaded_file(self, f):
destination = open('<path>', 'wb+')
for chunk in f.chunks():
destination.write(chunk)
destination.close()
Thanks in advance
If you want to your uploads to go directly to AWS S3, you can use django-storages and set your Django file storage backend to use AWS S3.
django-storages
django-storages documentation
This will allow your Django project to handle storage transparently to S3 without your having to manually re-upload your uploaded files to S3.
Storage Settings
You will need to add at least these configurations to your Django settings:
# default remote file storage
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
# aws access keys
AWS_ACCESS_KEY_ID = 'YOUR-ACCESS-KEY'
AWS_SECRET_ACCESS_KEY = 'YOUR-SECRET-ACCESS-KEY'
AWS_BUCKET_NAME = 'your-bucket-name'
AWS_STORAGE_BUCKET_NAME = AWS_BUCKET_NAME
Example Code to Store Upload to Remote Storage
This is a modified version of your view with a the handle_uploaded_file method using Django's storage backend to save the uploade file to the remote destination (using django-storages).
Note: Be sure to define the DEFAULT_FILE_STORAGE and AWS keys in your settings so django-storage can access your bucket.
from django.core.files.storage import default_storage
from django.core.files import File
# set file i/o chunk size to maximize throughput
FILE_IO_CHUNK_SIZE = 128 * 2**10
class FileUploadView(APIView):
parser_classes = (FileUploadParser,)
def put(self, request, filename, format=None):
file_obj = request.data['file']
self.handle_uploaded_file(file_obj)
return self.get_response("", True, "", {})
def handle_uploaded_file(self, f):
"""
Write uploaded file to destination using default storage.
"""
# set storage object to use Django's default storage
storage = default_storage
# set the relative path inside your bucket where you want the upload
# to end up
fkey = 'sub-path-in-your-bucket-to-store-the-file'
# determine mime type -- you may want to parse the upload header
# to find out the exact MIME type of the upload file.
content_type = 'image/jpeg'
# write file to remote server
# * "file" is a File storage object that will use your
# storage backend (in this case, remote storage to AWS S3)
# * "media" is a File object created with your upload file
file = storage.open(fkey, 'w')
storage.headers.update({"Content-Type": content_type})
f = open(path, 'rb')
media = File(f)
for chunk in media.chunks(chunk_size=FILE_IO_CHUNK_SIZE):
file.write(chunk)
file.close()
media.close()
f.close()
See more explanation and examples on how to access the remote storage here:
django-storages: Amazon S3
Take a look at boto package which provides AWS APIs:
from boto.s3.connection import S3Connection
s3 = S3Connection(access_key, secret_key)
b = s3.get_bucket('<bucket>')
mp = b.initiate_multipart_upload('<object>')
for i in range(1, <parts>+1):
io = <receive-image-part> # E.g. StringIO
mp.upload_part_from_file(io, part_num=i)
mp.complete_upload()
Related
I am trying to automate gzipping on image uploaded to cloud storage bucket.
Everytime i upload an image, i want cloud function to run python code to convert it into gzip and move it to another bucket present in storage.
My code is not running. It's giving me File not found error. Also what's the right way to give full location.
My code is ..
from google.cloud import storage
import gzip
import shutil
client = storage.Client()
def hello_gcs(event, context):
"""Triggered by a change to a Cloud Storage bucket.
Args:
event (dict): Event payload.
context (google.cloud.functions.Context): Metadata for
the event.
"""
with open("/"+event['name'],'rb') as f_input :
with gzip.open("'/tmp/'+event['name']+'.gz'","wb") as f_output:
shutil.copyfileobj(f_input,f_output)
source_bucket= client.bucket(event['bucket'])
source_blob = source_bucket.blob("/tmp/" + event['name'])
destination_bucket = client.bucket('baalti2')
blob_copy = source_bucket.copy_blob(
source_blob, destination_bucket, event['name']
)
print(
"Blob {} in bucket {} moved to blob {} in bucket {}.".format(
source_blob.name,
source_bucket.name,
blob_copy.name,
destination_bucket.name,
)
)
You are using Linux file system APIs (open, shutil.copy) to access Cloud Storge which will not work. Copy the file from the bucket to local storage. Gzip the file. Copy the gzip file to the destination bucket. Use the Cloud Storage APIs to interact with Cloud Storage.
I'm now trying to make this simple Python script that will run when called through an API call or through a Google Function. I'm very, very new to GCP and Python as I'm more familiar with Azure and PowerShell, but I need to know what I need to use/call in order to upload a file to a bucket and also read the file information, plus then connect to a MongoDB database.
Here is the flow of what I need to do:
API/function will be called with its URL and attached to it will be an actual file, like a seismic file type.
When the API/function is called, a Python script will run that will grab that file and upload it to a bucket.
Then I need to run commands against the uploaded file to retrieve items like "version","company","wellname", etc.
Then I want to upload a document, with all of these values, into a MongoDB database.
We're basically trying to replicate something we did in Azure with Functions and a CosmosDB instance. There, we created a function that would upload the file to Azure storage, then retrieve values from the file, which I believe is the metadata of it. After, we would upload a document to CosmosDB with these values. It's a way of recording values retrieved from the file itself. Any help would be appreciated as this is part of a POC I'm trying to present on!! Please ask any questions!
To answer your questions:
Here's a code on how to upload a file to a Cloud Storage bucket using Python:
from google.cloud import storage
def upload_blob(bucket_name, source_file_name, destination_blob_name):
"""Uploads a file to the bucket."""
# bucket_name = "your-bucket-name"
# source_file_name = "local/path/to/file"
# destination_blob_name = "storage-object-name"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
print(
"File {} uploaded to {}.".format(
source_file_name, destination_blob_name
)
)
Make sure that you have the Cloud Storage client library installed on your requirements.txt. Example:
google-cloud-storage>=1.33.0
To retrieve and view the object metadata here's the code and link as reference:
from google.cloud import storage
def blob_metadata(bucket_name, blob_name):
"""Prints out a blob's metadata."""
# bucket_name = 'your-bucket-name'
# blob_name = 'your-object-name'
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.get_blob(blob_name)
print("Blob: {}".format(blob.name))
print("Bucket: {}".format(blob.bucket.name))
print("Storage class: {}".format(blob.storage_class))
print("ID: {}".format(blob.id))
print("Size: {} bytes".format(blob.size))
...
Finally,
To connect to MongoDB using Python, here's a sample project which could help you understand how it works.
We are currently creating a website that is kind of an upgrade to an old existing one. We would like to keep the old posts (that include images) in the new website. The old files are kept in an ec2 instance while the new website is serverless and keeps all it's files in s3.
My question, is there any way I could transfer the old files (from ec2) to the new s3 bucket using Python. I would like rename and relocate the files in the new filename/filepathing pattern that we devs decided.
There is boto3, python aws toolkit.
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-uploading-files.html
import logging
import boto3
from botocore.exceptions import ClientError
def upload_file(file_name, bucket, object_name=None):
"""Upload a file to an S3 bucket
:param file_name: File to upload
:param bucket: Bucket to upload to
:param object_name: S3 object name. If not specified then file_name is used
:return: True if file was uploaded, else False
"""
# If S3 object_name was not specified, use file_name
if object_name is None:
object_name = file_name
# Upload the file
s3_client = boto3.client('s3')
try:
response = s3_client.upload_file(file_name, bucket, object_name)
except ClientError as e:
logging.error(e)
return False
return True
You can write script with S3 upload_file function, then run on your ec2 locally.
I'm a Ruby dev trying my hand at Google Cloud Functions written in Python and have hit a wall with transferring a remote file from a given URL to Google Cloud Storage (GCS).
In an equivalent RoR app I download to the app's ephemeral storage and then upload to GSC.
I am hoping there's a way to simply 'download' the remote file to my GCS bucket via the Cloud Function.
Here's a simplified example of what I am doing with some comments, the real code fetches the URLs from a private API, but that works fine and isn't where the issue is.
from google.cloud import storage
project_id = 'my-project'
bucket_name = 'my-bucket'
destination_blob_name = 'upload.test'
storage_client = storage.Client.from_service_account_json('my_creds.json')
# This works fine
#source_file_name = 'localfile.txt'
# When using a remote URL I get 'IOError: [Errno 2] No such file or directory'
source_file_name = 'http://www.hospiceofmontezuma.org/wp-content/uploads/2017/10/confused-man.jpg'
def upload_blob(bucket_name, source_file_name, destination_blob_name):
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
upload_blob(bucket_name, source_file_name, destination_blob_name)
Thanks in advance.
It is not possible to upload a file to Google Cloud Storage directly from an URL. Since you are running the script from a local environment, the file contents that you want to upload, need to be in that same environment. This means that the contents of the url need to either be stored in the memory, or in a file.
An example showing how to do it, based in your code:
Option 1: You can use the wget module, that will fetch the url and download it's contents into a local file (similar to the wget CLI command). Note that this means that the file will be stored locally, and then uploaded from the file. I added the os.remove line to remove the file once the upload is done.
from google.cloud import storage
import wget
import io, os
project_id = 'my-project'
bucket_name = 'my-bucket'
destination_blob_name = 'upload.test'
storage_client = storage.Client.from_service_account_json('my_creds.json')
source_file_name = 'http://www.hospiceofmontezuma.org/wp-content/uploads/2017/10/confused-man.jpg'
def upload_blob(bucket_name, source_file_name, destination_blob_name):
filename = wget.download(source_file_name)
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(filename, content_type='image/jpg')
os.remove(filename)
upload_blob(bucket_name, source_file_name, destination_blob_name)
Option 2: using the urllib module, works similar to the wget module, but instead of writing into a file it writes to a variable. Note that I did this example im Python3, there are some differences if you plan to run your script in Python 2.X.
from google.cloud import storage
import urllib.request
project_id = 'my-project'
bucket_name = 'my-bucket'
destination_blob_name = 'upload.test'
storage_client = storage.Client.from_service_account_json('my_creds.json')
source_file_name = 'http://www.hospiceofmontezuma.org/wp-content/uploads/2017/10/confused-man.jpg'
def upload_blob(bucket_name, source_file_name, destination_blob_name):
file = urllib.request.urlopen(source_file_name)
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_string(link.read(), content_type='image/jpg')
upload_blob(bucket_name, source_file_name, destination_blob_name)
Directly transferring URLs into GCS is possible through the Cloud Transfer service, but setting up a cloud transfer job for a single URL is a lot of overhead. That sort of solution is targeted towards a situation with millions of URLs that need to become GCS objects.
Instead, I recommend writing a job that pumps an incoming stream from reading a URL into a write stream to GCS and running that somewhere in the Google Cloud close to the bucket.
We currently use blobstore.create_upload_url to create upload urls to be used on the frontend see Uploading a blob.
However, with the push toward Google Cloud Storage (GCS) by Google, I'd like to use GCS instead of the blobstore. We use currently blobstore.create_upload_url but I can't find anything equivalent in the GCS documentation. Am I missing something? Is there a better way to upload files to GCS from the frontend?
Thanks
Rob
If you will provide gs_bucket_name to blobstore.create_upload_url file will be stored in GCS instead of blobstore, this is described in official documentation: Using the Blobstore API with Google Cloud Storage
blobstore.create_upload_url(
success_path=webapp2.uri_for('upload'),
gs_bucket_name="mybucket/dest/location")
You can take a look at simple upload handler implementation made in webapp2
from google.appengine.ext import blobstore
from google.appengine.ext.webapp import blobstore_handlers
import webapp2
import cloudstorage as gcs
class Upload(blobstore_handlers.BlobstoreUploadHandler):
"""Upload handler
To upload new file you need to follow those steps:
1. send GET request to /upload to retrieve upload session URL
2. send POST request to URL retrieved in step 1
"""
def post(self):
"""Copy uploaded files to provided bucket destination"""
fileinfo = self.get_file_infos()[0]
uploadpath = fileinfo.gs_object_name[3:]
stat = gcs.stat(uploadpath)
# remove auto generated filename from upload path
destpath = "/".join(stat.filename.split("/")[:-1])
# copy file to desired location with proper filename
gcs.copy2(uploadpath, destpath)
# remove file from uploadpath
gcs.delete(uploadpath)
def get(self):
"""Returns URL to open upload session"""
self.response.write(blobstore.create_upload_url(
success_path=uri_for('upload'),
gs_bucket_name="mybucket/subdir/subdir2/filename.ext"))