Uploading a file to GCS in GAE (Python) - python

Im uploading files to my buckets in GCS thru GAE
upload_url = blobstore.create_upload_url('/upload', gs_bucket_name='my_bucket')
as described in the documentation and this question
Everything works fine, but, when I try to read the contents I find that the filename is being changed to a key value such as:
L2FwcGhvc3RpbmdfcHJvZC9ibG9icy9BRW5CMlVwOW93MmJzVWRyZ2RQSHJpMlNhMkZNUkloYm9xcnZnZlFzNEZCYnpWaGNENGkROOFk5b2pHSHBMcDIwcGVrVFZtYzdROHRDRWFpdy50YTNpMFdpNmNCQU9NU0xt
Is there any way to get the uploaded name of the file?
Thanks in advance

see: https://cloud.google.com/appengine/docs/python/blobstore/#Python_Using_the_Blobstore_API_with_Google_Cloud_Storage
"In your upload handler, you need to process the returned FileInfo metadata and explicitly store the GCS filename needed to retrieve the blob later."
More on FileInfo: https://cloud.google.com/appengine/docs/python/blobstore/fileinfoclass
and I think this question is similar to How to get FileInfo/gcs file_name for files uploaded via BlobstoreUploadHandler?

Related

How to upload an image to MongoDB using an S3 bucket and Boto3 in Python

I'm working on a Python application where the desired functionality is that the webcam is used to take in a live video feed and based on whether a condition is true, an image is clicked and uploaded to a database.
The database I am using is MongoDB. As far as I can understand, uploading images straight-up to a database is not the correct method. So, what I wanted to do is the following:
an image is clicked from the webcam
the image is uploaded to an S3 bucket (from the same Python script, so using boto3 perhaps)
a URL of the uploaded image is retrieved (this seems to be the tricky part)
and then this URL along with some other details is uploaded to the database. (this is the easy part)
My ideal workflow would be a way to take that image and upload it to an S3 bucket, retrieve the URL and then upload this URL to the database all in one .py script.
My question is: how do I upload an image to an S3 bucket and then retrieve its public URL all through boto3 in a Python script?
I also welcome any suggestions for a better approach/strategy for storing images into MongoDB. I saw on some pages that GridFS could be a good method but that it is not recommended for the image uploads happening frequently (and really that using AWS is the more preferable way).
The URL of an S3 object can be construed if you know the S3 bucket and name:
https://{bucket}.s3.{region}.amazonaws.com/{key}
Using boto3 will be the easiest way to upload a file if you're using Python anyway.
See another answer of mine on different ways how to upload files here: https://stackoverflow.com/a/67108609/13245310
You don't need to 'retrieve' the public url, you get to specify the bucket and name of the s3 object when you upload it, so you already have the information you need to know what the public url will be once uploaded, its not like s3 assigns a new unique name to your object once uploaded.

How to upload HDF5 file directly to S3 bucket in Python

I want to upload a HDF5 file created with h5py to S3 bucket without saving locally using boto3.
This solution uses pickle.dumps and pickle.loads and other solutions I have found, store the file locally which I like to avoid.
You can use io.BytesIO() to and put_object as illustrated here 6. Hope this helps. Even in this case, you'd have to 'store' the data locally(though 'in memory'). You could also create a tempfile.TemporaryFile and then upload your file with put_object. I don't think you can stream to an S3 Buckets in the sense that the local data would be discarded as it is uploaded to the Bucket.

Put django file object into tikka server

In my project I have receiving multiple files using request.FILES.getlist('filedname') and saving it using django forms save method. Again reading the same files using tika server api of python:
def read_by_tika(self, path):
'''file reading using tika server'''
parsed = parser.from_file(str(path))
contents = (parsed["content"].encode('utf-8'))
return contents
Is there any way to directly put list files getting from request.FILES to tikka server without saving it on hard disk.
If the files are small, try using tika's .from_buffer() with file.read(). However, files over 2.5 MBs are anyway saved to temporary files by django, see Where uploaded data is stored. In this case use read_by_tika(file.temporary_file_path()). See also file upload settings

Storing multiple files with the same name in Google Cloud Storage?

So I am trying to port a Python webapp written with Flask to Google App Engine. The app hosts user uploaded files up to 200mb in size, and for non-image files the original name of the file needs to be retained. To prevent filename conflicts, e.g. two people uploading stuff.zip, each containing completely different and unrelated contents, the app creates a UUID folder on the filesystem and stores the file within that, and serves them to users. Google App Engine's Cloud Storage, which I was planning on using to store the user files, by making a bucket - according to their documentation has "no notion of folders". What is the best way to go about getting this same functionality with their system?
The current method, just for demonstration:
# generates a new folder with a shortened UUID name to save files
# other than images to avoid filename conflicts
else:
# if there is a better way of doing this i'm not clever enough
# to figure it out
new_folder_name = shortuuid.uuid()[:9]
os.mkdir(
os.path.join(app.config['FILE_FOLDER'], new_folder_name))
file.save(
os.path.join(os.path.join(app.config['FILE_FOLDER'], new_folder_name), filename))
new_folder_path = os.path.join(
app.config['FILE_FOLDER'], new_folder_name)
return url_for('uploaded_file', new_folder_name=new_folder_name)
From the Google Cloud Storage Client Library Overview documentation:
GCS and "subdirectories"
Google Cloud Storage documentation refers to "subdirectories" and the GCS client library allows you to supply subdirectory delimiters when you create an object. However, GCS does not actually store the objects into any real subdirectory. Instead, the subdirectories are simply part of the object filename. For example, if I have a bucket my_bucket and store the file somewhere/over/the/rainbow.mp3, the file rainbow.mp3 is not really stored in the subdirectory somewhere/over/the/. It is actually a file named somewhere/over/the/rainbow.mp3. Understanding this is important for using listbucket filtering.
While Cloud Storage does not support subdirectories per se, it allows you to use subdirectory delimiters inside filenames. This basically means that the path to your file will still look exactly as if it was inside a subdirectory, even though it is not. This apparently should concern you only when you're iterating over the entire contents of the bucket.
From the Request URIs documentation:
URIs for Standard Requests
For most operations you can use either of the following URLs to access objects:
storage.googleapis.com/<bucket>/<object>
<bucket>.storage.googleapis.com/<object>
This means that the public URL for their example would be http://storage.googleapis.com/my_bucket/somewhere/over/the/rainbow.mp3. Their service would interpret this as bucket=my_bucket and object=somewhere/over/the/rainbow.mp3 (i.e. no notion of subdirectories, just an object name with embedded slashes in it); the browser however will just see the path /my_bucket/somewhere/over/the/rainbow.mp3 and will interpret it as if the filename is rainbow.mp3.

Writing files to google app engine blobstore as the methods are going to be deprecated

I want to save some data fetched from the web to blobstore, but the google doc says that
Deprecated: The Files API feature used here to write files to Blobstore is going to be removed at some time in the future, in favor of writing files to Google Cloud Storage and using Blobstore to serve them.
The code in python is as follows
from __future__ import with_statement
from google.appengine.api import files
# Create the file
file_name = files.blobstore.create(mime_type='application/octet-stream')
# Open the file and write to it
with files.open(file_name, 'a') as f:
f.write('data')
# Finalize the file. Do this before attempting to read it.
files.finalize(file_name)
# Get the file's blob key
blob_key = files.blobstore.get_blob_key(file_name)
I am wondering if there is another way to write to blobstore instead of the official upload method.
If you want to use a file-like API, you have to go with GCS.
Blobstore is for uploading more-or-less static images and serving them.
If you want to write using a a file-like API and then serve from Blobstore, you can write to GCS and get a BlobKey to the file.
https://cloud.google.com/appengine/docs/python/blobstore/#Python_Using_the_Blobstore_API_with_Google_Cloud_Storage
But writing to BlobStore like you want is deprecated. Stop trying to do it that way.
An option may be to put the data in the datastore using a TextProperty

Categories