Persisting File to App Engine Blobstore - python

The App Engine documentation for the Blobstore gives a pretty thorough explanation of how to upload a file using the BlobstoreUploadHandler provided by the webapp framework.
However, I have a cgi.FieldStorage instance that I would like to store directly into the Blobstore. In other words, I don't need to upload the file since this is taken care of by other means; I just need to store it.
I've been looking through the blobstore module source to try to understand how the upload handler creates/generates blobstore keys and ultimately writes files to the blobstore itself, but I'm getting lost. It seems like the CreateUploadURLResponse in blobstore_service_pb is where the actual write would occur, but I'm not seeing the component that actually implements that functionality.
Update
There is also an implementation for storing files directly into the filesystem, which I think is what the upload handler does in the end. I am not entirely sure about this, so an explanation as to whether or not using the FileBlobStorage is the correct way to go would be appreciated.

After the deprecation of the files API you can no longer write directly to blobstore.
You should write to Google Cloud Storage instead. For that you can use the AE GCS client
Files written to Google Cloud Storage could be served by the Blobstore API by creating a blob key.

Related

How to serve cloudstorage files using app engine SDK

In app engine I can serve cloudstorage files like a pdf using the default bucket of my application:
http://storage.googleapis.com/<appid>.appspot.com/<file_name>
But how can I serve local cloudstorage files in the SDK, without making use of a blob_key?
I write to the default bucket like this:
gcs_file_name = '/%s/%s' % (app_identity.get_default_gcs_bucket_name(), file_name)
with gcs.open(gcs_file_name, 'w') as f:
f.write(data)
The name of the default bucket in the SDK = 'app_default_bucket'
In the SDK datastore I have a Kind: GsFileInfo showing: filename: /app_default_bucket/example.pdf
Update and workaround: You can get a serving url for NON image files like css, js and pdf.
gs_file = '/gs/%s/%s/%s' % (app_identity.get_default_gcs_bucket_name(), folder, filename)
serving_url = images.get_serving_url(blobstore.create_gs_key(gs_file))
UPDATE I found this feature to serve cloudstorage files using the SDK:
This feature has not been documented yet.
http://localhost:8080/_ah/gcs/app_default_bucket/filename
This meands we do not need the img serving url to serve NON images as shown below !!!
To create e serving url for cloudstorage files like images, css, js and pdf's in the default_bucket, I use this code for testing(SDK) and GAE production:
IMPORTANT: the images.get_serving_url() works also for NON images in the SDK!!
In the SDK you stll need the blobstore to read a blob and create a serving url for a cloudstorage object.
I also added the code to read, write and upload cloudstorage blobs in the SDK and GAE production.
The code can be found here.
This is the value that you see in the Development mode from app_identity_stub.py:
APP_DEFAULT_GCS_BUCKET_NAME = 'app_default_bucket'
The comments in this file explain it:
This service behaves the same as the production service, except using
constant values instead of app-specific values
You should get the correct URL in your production code.
EDIT:
This is from the support forum:
In development mode, the app engine tools simulate Google Cloud
Storage services locally. Objects in that simulated environment are
non-persistent so your app is failing because the desired object
doesn't exist in the local store. If you first create (and optionally
write to) the object you're trying to read, it should work fine in dev
mode (it did for me). Of course, objects in the production service are
persistent so there's no need for that extra step when running your
app in production mode (assuming the object already exists).
Hope that helps,
Marc Google Cloud Storage Team
This means you have to write a file first, then you can use it. If I understand correctly, you can use any bucket name for this purpose, including 'app_default_bucket'.
I was here earlier looking for answers and just wanted to share what I found, now that I have it working.
You can do this now, and it's only a little painful. Tricking the image or blobstore API isn't supported and doesn't seem to work any longer.
See:
https://cloud.google.com/storage/docs/access-control/signed-urls
https://cloud.google.com/storage/docs/access-control/create-signed-urls-gsutil
If you sign your URLs, you can give auto-expiring links to your content, for anonymous or paywalled consumption. You wouldn't want to serve your whole site this way, but for a PDF or whatnot, this is a valid and semi-secure option.
Missing from the documentation, you might need to drop the newline for the canonical extended headers. The storage endpoint will tell you what it expects when the signature is bad.
Also, your host should be: https://storage-download.googleapis.com/
If you're using App Engine, then the GoogleAccessId is: <projectname>#appspot.gserviceaccount.com
See: app_identity.get_service_account_name()
Example of how to generate the signature:
from google.appengine.api import app_identity
def signFile(path, verb='GET', md5='', contentType='',
expiration=''):
signatureRequest = '{}\n{}\n{}\n{}\n{}'.format(
verb, md5, contentType, expiration, path)
return app_identity.sign_blob(signatureRequest)
That returns a tuple of (privateKey, binarySignature).
Now you need to construct the URL. The signature should be base64 encoded, then urlencoded. See the following for how to finish constructing the URL. You should probable use the download host mentioned above.
Example URL from the docs:
https://storage.googleapis.
com/example-bucket/cat.jpeg?GoogleAccessId=example#example-project.iam.gservicea
ccount.com&Expires=1458238630&Signature=VVUgfqviDCov%2B%2BKnmVOkwBR2olSbId51kSib
uQeiH8ucGFyOfAVbH5J%2B5V0gDYIioO2dDGH9Fsj6YdwxWv65HE71VEOEsVPuS8CVb%2BVeeIzmEe8z
7X7o1d%2BcWbPEo4exILQbj3ROM3T2OrkNBU9sbHq0mLbDMhiiQZ3xCaiCQdsrMEdYVvAFggPuPq%2FE
QyQZmyJK3ty%2Bmr7kAFW16I9pD11jfBSD1XXjKTJzgd%2FMGSde4Va4J1RtHoX7r5i7YR7Mvf%2Fb17
zlAuGlzVUf%2FzmhLPqtfKinVrcqdlmamMcmLoW8eLG%2B1yYW%2F7tlS2hvqSfCW8eMUUjiHiSWgZLE
VIG4Lw%3D%3D
I hope this helps someone!
Oh yeah, you only need to do all the signature stuff if your bucket isn't publicly accessible (read-all).

Serving images directly from GCS in GAE using Blobstore API and Images API

Many questions and answers on Blobstore and Google Cloud Storage(GCS) are two or three years old, while things change dramatically these years. GCS is no longer a standalone service. It is integrated into Google App Engine (GAE) now.
Google seems to push GCS so hard that Blobstore is deprecated, for example,
The Files API feature used here to write files to Blobstore has been
deprecated and is going to be removed at some time in the future, in
favor of writing files to Google Cloud Storage and using Blobstore to
serve them.
I believe it is high time to switch to GCS.
For example, www.example.com is a site built on GAE, while example.jpg is an image stored on GCS, I want to serve the image using the url http://www.example.com/images/example.jpg
This used to be impossible, but now it is possible thanks to the integration.
I found this:
https://developers.google.com/appengine/docs/python/googlecloudstorageclient/
says:
When the Blobstore API is used together with the Images API, you get a
powerful way to serve images, because you can serve images directly
from GCS, bypassing the App Engine app, which saves on instance hour
costs.
I do know how to 'bypassing the App Engine app'. Is there any example on how to bypass GAE while serving the images using Blobstore API and Images API?
Instructions are here: https://developers.google.com/appengine/docs/python/images/functions#Image_get_serving_url
Start with an image hosted in Google Cloud Storage.
First, use the Blobstore API's create_gs_key() function to generate a blob key for your GCS image object. Then, pass that blob key into the Image API's get_serving_url() function.
The Image API will give you a special URL that skips over your app engine app and serves the image directly.
You actually do not need to use BlobStore at all now. The following will work to get the images API URL for serving images stored in GCS:
from google.appengine.api import images
images.get_serving_url(None, filename='/gs/<bucket>/<object>'))
Serving images from 'www' is not a good idea if you are using www as you GAE Cname, to serve images you can create a new sub-domain.In our case we are using cdn.example.com, and we serve our images like http://cdn.example.com/images/example.jpg
How Do we do it.
create a GCS bucket with name cdn.example.com and place your images under /images path.
Specify your index and 404 pages and it will be good to go.
more on this https://cloud.google.com/storage/docs/website-configuration?hl=en

Serve a txt file programmatically with gae python

I know that I can't write files into the google app engine system, but I wonder if from the datastore I could programmatically build a txt file and serve it directly to download to the user of the application. I am not storing the file. I just want to serve it.
Any idea if this is possible?
Yes, it's possible.
You need to set the header to indicate that the file must be an attachment.
class MainHandler(webapp2.RequestHandler):
def test_download(self):
self.response.headers.add_header('content-disposition','attachment',filename='text.txt')
self.response.write("hello world")
You can see more information looking at the source for webapp2
Regarding "can't write files into the google app engine system", you can write to the blobstore instead. So if you need to generate a large file, you write it to the blobstore and serve it from there.

How to serve url to audio in the gae blobstore

I have audio file stored as blobs in google app engine's blobstore. I'm not sure how to get a good url to pass to the client side to play the blob. I would like to do something like the image library.
image.get_serving_url()
But, there is no audio module. So, is there a good way to get a url from a blob to play audio or even better any media?
The rendering of an image is done by the browser. It's the same for audio, the browser decides what to do with a resource you point it to. For that, you need to add the correct mime type[1] header. If the file already had the correct mime type set when being uploaded you don't need to do this manually.
As for serving the blob, you need to create a blobstore download handler:
http://code.google.com/appengine/docs/python/tools/webapp/blobstorehandlers.html#BlobstoreDownloadHandler
[1] http://en.wikipedia.org/wiki/Internet_media_type
I think what you're looking for is something like how S3 works, where the blobs you upload are automatically given a URL that can then be dropped directly in to the browser. Blobstore was designed to primarily give developers control over their URLs and fine grained control over access to the blobs. It does not have the facility to simply provide a URL based on, say, the blob reference. I think schuppe's answer is correct in describing what you need to do.
If you are interested in simply serving a blob to a user without any kind of authentication or restriction, it's not that hard to write a handler. The one that is in the documentation that schuppe referred you to will work ok, however, be careful, because it could open your app up to certain types of DOS attacks. Also, if you do it as the documentation does it, anyone who has one of your blob-reference strings can access any blob throughout your whole application, whether you mean to or not. Therefore you should build some additional access control around it.
Of course, if you're not concerned with controlling access to the data, that solutions is simple and will work fine.

Some Google App Engine BlobStore Problems

I have googled and read the docs on Google App Engine official site about BlobStore but there are some problems that I still dont understand. My Platform is webapp.
Docs I have read:
webapp Blobstore Handlers
Blobstore Python API Overview
The Blobstore Python API
After reading all these docs, I still have some problems:
In Blobstore Python API Overview it says: maximum size of Blobstore data that can be read by the app with one API call is 1MB. What does this mean? Does this 1MB limit apply to sendblob()? Take the following code from webapp Blobstore Handlers as an example:
class ViewPhotoHandler(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, photo_key):
self.send_blob(photo_key)
Does that mean the photo ( which is uploaded and stored in the blobstore )associated with the photo_key must be less than 1MB? From the context, I dont think so. I think the the photo can be as large as 2GB. But I am not sure.
How is the ContentType determined on send_blob()? Is it text/html or image/jpeg? Can I set somewhere it myself? The following explanation from webapp Blobstore Handlers is so confusing. Quite difficult for a non-english speaker. Can someone paraphrase it with code samples? Where is the docs for send_blob()? I cant find it.
The send_blob() method accepts a save_as argument that determines whether the blob data is sent as raw response data or as a MIME attachment with a filename, which prompts web browsers to save the file with the given name instead of displaying it. If the value of the argument is a string, the blob is sent as an attachment, and the string value is used as the filename. If True and blob_key_or_info is a BlobInfo object, the filename from the object is used. By default, the blob data is sent as the body of the response and not as a MIME attachment.
There is a file http://www.example.com/example.avi which is 20MB or even 2GB. I want to fetch example.avi from the internet and store it in the BlobStore. I checked, the urlfetch request size limit is 1MB. I searched and hadnt found a solution.
Thanks a lot!
send_blob() doesn't involve your application reading the file from the API, so the 1MB limit doesn't apply. The frontend service that returns the response to the user will read the entire blob and return all of it in the response (it most likely does the reading in chunks, but this is an implementation detail that you don't have to worry about.
send_blob() sets the content type to either the Blob's internal stored type, or the type you specify with an optional content_type parameter to send_blob(). For the documentation, it seems to need to RTFS; there's a docstring (in the google.appengine.ext.webapp.blobstore_handlers package.)
There's really no great solution for fetching arbitrary files from the web and storing them in Blobstore. Most likely you'd need a service running elsewhere, like your own machine or an EC2 instance, to fetch the files and POST them to a blobstore handler in your application.

Categories