How to serve cloudstorage files using app engine SDK

How to serve cloudstorage files using app engine SDK - python

In app engine I can serve cloudstorage files like a pdf using the default bucket of my application:
http://storage.googleapis.com/<appid>.appspot.com/<file_name>
But how can I serve local cloudstorage files in the SDK, without making use of a blob_key?
I write to the default bucket like this:
gcs_file_name = '/%s/%s' % (app_identity.get_default_gcs_bucket_name(), file_name)
with gcs.open(gcs_file_name, 'w') as f:
f.write(data)
The name of the default bucket in the SDK = 'app_default_bucket'
In the SDK datastore I have a Kind: GsFileInfo showing: filename: /app_default_bucket/example.pdf
Update and workaround: You can get a serving url for NON image files like css, js and pdf.
gs_file = '/gs/%s/%s/%s' % (app_identity.get_default_gcs_bucket_name(), folder, filename)
serving_url = images.get_serving_url(blobstore.create_gs_key(gs_file))

UPDATE I found this feature to serve cloudstorage files using the SDK:
This feature has not been documented yet.
http://localhost:8080/_ah/gcs/app_default_bucket/filename
This meands we do not need the img serving url to serve NON images as shown below !!!
To create e serving url for cloudstorage files like images, css, js and pdf's in the default_bucket, I use this code for testing(SDK) and GAE production:
IMPORTANT: the images.get_serving_url() works also for NON images in the SDK!!
In the SDK you stll need the blobstore to read a blob and create a serving url for a cloudstorage object.
I also added the code to read, write and upload cloudstorage blobs in the SDK and GAE production.
The code can be found here.

This is the value that you see in the Development mode from app_identity_stub.py:
APP_DEFAULT_GCS_BUCKET_NAME = 'app_default_bucket'
The comments in this file explain it:
This service behaves the same as the production service, except using
constant values instead of app-specific values
You should get the correct URL in your production code.
EDIT:
This is from the support forum:
In development mode, the app engine tools simulate Google Cloud
Storage services locally. Objects in that simulated environment are
non-persistent so your app is failing because the desired object
doesn't exist in the local store. If you first create (and optionally
write to) the object you're trying to read, it should work fine in dev
mode (it did for me). Of course, objects in the production service are
persistent so there's no need for that extra step when running your
app in production mode (assuming the object already exists).
Hope that helps,
Marc Google Cloud Storage Team
This means you have to write a file first, then you can use it. If I understand correctly, you can use any bucket name for this purpose, including 'app_default_bucket'.

I was here earlier looking for answers and just wanted to share what I found, now that I have it working.
You can do this now, and it's only a little painful. Tricking the image or blobstore API isn't supported and doesn't seem to work any longer.
See:
https://cloud.google.com/storage/docs/access-control/signed-urls
https://cloud.google.com/storage/docs/access-control/create-signed-urls-gsutil
If you sign your URLs, you can give auto-expiring links to your content, for anonymous or paywalled consumption. You wouldn't want to serve your whole site this way, but for a PDF or whatnot, this is a valid and semi-secure option.
Missing from the documentation, you might need to drop the newline for the canonical extended headers. The storage endpoint will tell you what it expects when the signature is bad.
Also, your host should be: https://storage-download.googleapis.com/
If you're using App Engine, then the GoogleAccessId is: <projectname>#appspot.gserviceaccount.com
See: app_identity.get_service_account_name()
Example of how to generate the signature:
from google.appengine.api import app_identity
def signFile(path, verb='GET', md5='', contentType='',
expiration=''):
signatureRequest = '{}\n{}\n{}\n{}\n{}'.format(
verb, md5, contentType, expiration, path)
return app_identity.sign_blob(signatureRequest)
That returns a tuple of (privateKey, binarySignature).
Now you need to construct the URL. The signature should be base64 encoded, then urlencoded. See the following for how to finish constructing the URL. You should probable use the download host mentioned above.
Example URL from the docs:
https://storage.googleapis.
com/example-bucket/cat.jpeg?GoogleAccessId=example#example-project.iam.gservicea
ccount.com&Expires=1458238630&Signature=VVUgfqviDCov%2B%2BKnmVOkwBR2olSbId51kSib
uQeiH8ucGFyOfAVbH5J%2B5V0gDYIioO2dDGH9Fsj6YdwxWv65HE71VEOEsVPuS8CVb%2BVeeIzmEe8z
7X7o1d%2BcWbPEo4exILQbj3ROM3T2OrkNBU9sbHq0mLbDMhiiQZ3xCaiCQdsrMEdYVvAFggPuPq%2FE
QyQZmyJK3ty%2Bmr7kAFW16I9pD11jfBSD1XXjKTJzgd%2FMGSde4Va4J1RtHoX7r5i7YR7Mvf%2Fb17
zlAuGlzVUf%2FzmhLPqtfKinVrcqdlmamMcmLoW8eLG%2B1yYW%2F7tlS2hvqSfCW8eMUUjiHiSWgZLE
VIG4Lw%3D%3D
I hope this helps someone!
Oh yeah, you only need to do all the signature stuff if your bucket isn't publicly accessible (read-all).

Related

How to use dropbox with Django on heroku?

I'm fairly new to django.
So heroku doesn't support image storage so I will have to use an other container for it. I've found a lot of tutorials for using Amazon S3 but I would like to use dropbox since it's free. Is this possible?
I've found this package https://django-storages.readthedocs.io/en/latest/ but I still don't understand how to use it. If anybody has used it please help me out. Thanks.

Sign up (if you haven’t already), go to the DropBox App Console, create a new application and generate the Access Token.
Use then the Python Dropbox SDK:
dbx = dropbox.Dropbox('access_token')
# create file
filename = '/local_files/file.json'
dbx.files_upload(f.read(), filename, mute=True)
# read file
filename = '/dropbox_root/file.json'
f, r = dbx.files_download(filename)
print(r.content)
You can see the Files on Heroku Medium post to see the details and few other options.

Persisting File to App Engine Blobstore

The App Engine documentation for the Blobstore gives a pretty thorough explanation of how to upload a file using the BlobstoreUploadHandler provided by the webapp framework.
However, I have a cgi.FieldStorage instance that I would like to store directly into the Blobstore. In other words, I don't need to upload the file since this is taken care of by other means; I just need to store it.
I've been looking through the blobstore module source to try to understand how the upload handler creates/generates blobstore keys and ultimately writes files to the blobstore itself, but I'm getting lost. It seems like the CreateUploadURLResponse in blobstore_service_pb is where the actual write would occur, but I'm not seeing the component that actually implements that functionality.
Update
There is also an implementation for storing files directly into the filesystem, which I think is what the upload handler does in the end. I am not entirely sure about this, so an explanation as to whether or not using the FileBlobStorage is the correct way to go would be appreciated.

After the deprecation of the files API you can no longer write directly to blobstore.
You should write to Google Cloud Storage instead. For that you can use the AE GCS client
Files written to Google Cloud Storage could be served by the Blobstore API by creating a blob key.

Storing multiple files with the same name in Google Cloud Storage?

So I am trying to port a Python webapp written with Flask to Google App Engine. The app hosts user uploaded files up to 200mb in size, and for non-image files the original name of the file needs to be retained. To prevent filename conflicts, e.g. two people uploading stuff.zip, each containing completely different and unrelated contents, the app creates a UUID folder on the filesystem and stores the file within that, and serves them to users. Google App Engine's Cloud Storage, which I was planning on using to store the user files, by making a bucket - according to their documentation has "no notion of folders". What is the best way to go about getting this same functionality with their system?
The current method, just for demonstration:
# generates a new folder with a shortened UUID name to save files
# other than images to avoid filename conflicts
else:
# if there is a better way of doing this i'm not clever enough
# to figure it out
new_folder_name = shortuuid.uuid()[:9]
os.mkdir(
os.path.join(app.config['FILE_FOLDER'], new_folder_name))
file.save(
os.path.join(os.path.join(app.config['FILE_FOLDER'], new_folder_name), filename))
new_folder_path = os.path.join(
app.config['FILE_FOLDER'], new_folder_name)
return url_for('uploaded_file', new_folder_name=new_folder_name)

From the Google Cloud Storage Client Library Overview documentation:
GCS and "subdirectories"
Google Cloud Storage documentation refers to "subdirectories" and the GCS client library allows you to supply subdirectory delimiters when you create an object. However, GCS does not actually store the objects into any real subdirectory. Instead, the subdirectories are simply part of the object filename. For example, if I have a bucket my_bucket and store the file somewhere/over/the/rainbow.mp3, the file rainbow.mp3 is not really stored in the subdirectory somewhere/over/the/. It is actually a file named somewhere/over/the/rainbow.mp3. Understanding this is important for using listbucket filtering.
While Cloud Storage does not support subdirectories per se, it allows you to use subdirectory delimiters inside filenames. This basically means that the path to your file will still look exactly as if it was inside a subdirectory, even though it is not. This apparently should concern you only when you're iterating over the entire contents of the bucket.
From the Request URIs documentation:
URIs for Standard Requests
For most operations you can use either of the following URLs to access objects:
storage.googleapis.com/<bucket>/<object>
<bucket>.storage.googleapis.com/<object>
This means that the public URL for their example would be http://storage.googleapis.com/my_bucket/somewhere/over/the/rainbow.mp3. Their service would interpret this as bucket=my_bucket and object=somewhere/over/the/rainbow.mp3 (i.e. no notion of subdirectories, just an object name with embedded slashes in it); the browser however will just see the path /my_bucket/somewhere/over/the/rainbow.mp3 and will interpret it as if the filename is rainbow.mp3.

Serving images directly from GCS in GAE using Blobstore API and Images API

Many questions and answers on Blobstore and Google Cloud Storage(GCS) are two or three years old, while things change dramatically these years. GCS is no longer a standalone service. It is integrated into Google App Engine (GAE) now.
Google seems to push GCS so hard that Blobstore is deprecated, for example,
The Files API feature used here to write files to Blobstore has been
deprecated and is going to be removed at some time in the future, in
favor of writing files to Google Cloud Storage and using Blobstore to
serve them.
I believe it is high time to switch to GCS.
For example, www.example.com is a site built on GAE, while example.jpg is an image stored on GCS, I want to serve the image using the url http://www.example.com/images/example.jpg
This used to be impossible, but now it is possible thanks to the integration.
I found this:
https://developers.google.com/appengine/docs/python/googlecloudstorageclient/
says:
When the Blobstore API is used together with the Images API, you get a
powerful way to serve images, because you can serve images directly
from GCS, bypassing the App Engine app, which saves on instance hour
costs.
I do know how to 'bypassing the App Engine app'. Is there any example on how to bypass GAE while serving the images using Blobstore API and Images API?

Instructions are here: https://developers.google.com/appengine/docs/python/images/functions#Image_get_serving_url
Start with an image hosted in Google Cloud Storage.
First, use the Blobstore API's create_gs_key() function to generate a blob key for your GCS image object. Then, pass that blob key into the Image API's get_serving_url() function.
The Image API will give you a special URL that skips over your app engine app and serves the image directly.

You actually do not need to use BlobStore at all now. The following will work to get the images API URL for serving images stored in GCS:
from google.appengine.api import images
images.get_serving_url(None, filename='/gs/<bucket>/<object>'))

Serving images from 'www' is not a good idea if you are using www as you GAE Cname, to serve images you can create a new sub-domain.In our case we are using cdn.example.com, and we serve our images like http://cdn.example.com/images/example.jpg
How Do we do it.
create a GCS bucket with name cdn.example.com and place your images under /images path.
Specify your index and 404 pages and it will be good to go.
more on this https://cloud.google.com/storage/docs/website-configuration?hl=en

Python GAE upload to Google Docs

I need to upload a file/document to Google Docs on a GAE application. This should be simple enough, but I'm having a lot of trouble with the API.
The context:
import gdata.docs.service
client = gdata.docs.service.DocsService()
client.ClientLogin('gmail', 'pass')
ms = gdata.MediaSource(#what goes in here?)
client.Upload(media_source=ms, title='title')
To upload I'm using client.Upload(), which takes a MediaSource (wrapper) object as a parameter. However, MediaSource() seems to only accept a filepath for a document: 'C:/Docs/ex.doc'.
Since I'm on GAE with no filesystem, I can only access the file through the Blobstore or a direct URL to the file. But how do I input that into MediaSource()?
There seems to be a way in Java to accomplish this by using MediaByteArraySource(), but nothing for Python.

If anyone is curious, here's how I solved this problem using the Document List API.
I didn't want to use the Drive SDK since it does complicate a lot of things. It's much simpler with the List API to just authenticate/login without the need for some OAuth trickery. This is using version 2.0.14 of the gdata Python library, which is not the current version (2.0.17), but it seems to have a simpler upload mechanism.
There's also slightly more (still sparse) documentation online for 2.0.14, though I had to piece this together from various sources and trial & error. The downside is that you cannot upload pdf's with this version. This code will not work with 2.0.17.
Here's the code:
import gdata.docs.service
import gdata.docs.data
from google.appengine.api import urlfetch
# get file from url
result = urlfetch.fetch('http://example.com/test.docx')
headers = result.headers
data = result.content
# authenticate client object
client = gdata.docs.service.DocsService()
client.ClientLogin('gmail', 'password')
# create MediaSource file wrapper
ms = gdata.MediaSource(file_handle=result.content,
content_type=headers['content-type'],
content_length=int(headers['content-length']))
# upload specific folder, return URL of doc
google_doc_name = 'title'
folder_uri = '/feeds/folders/private/full/folder:j7XO8SJj...'
entry = client.Upload(ms, google_doc_name, folder_or_uri=secret.G_FOLDER_URI)
edit_url = entry.GetAlternateLink().href

The Google Drive SDK docs include a complete sample application written in Python that runs on App Engine:
https://developers.google.com/drive/examples/python
You can use it as reference for your implementation and to see how to save a file from App Engine.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.