Some Google App Engine BlobStore Problems - python

I have googled and read the docs on Google App Engine official site about BlobStore but there are some problems that I still dont understand. My Platform is webapp.
Docs I have read:
webapp Blobstore Handlers
Blobstore Python API Overview
The Blobstore Python API
After reading all these docs, I still have some problems:
In Blobstore Python API Overview it says: maximum size of Blobstore data that can be read by the app with one API call is 1MB. What does this mean? Does this 1MB limit apply to sendblob()? Take the following code from webapp Blobstore Handlers as an example:
class ViewPhotoHandler(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, photo_key):
self.send_blob(photo_key)
Does that mean the photo ( which is uploaded and stored in the blobstore )associated with the photo_key must be less than 1MB? From the context, I dont think so. I think the the photo can be as large as 2GB. But I am not sure.
How is the ContentType determined on send_blob()? Is it text/html or image/jpeg? Can I set somewhere it myself? The following explanation from webapp Blobstore Handlers is so confusing. Quite difficult for a non-english speaker. Can someone paraphrase it with code samples? Where is the docs for send_blob()? I cant find it.
The send_blob() method accepts a save_as argument that determines whether the blob data is sent as raw response data or as a MIME attachment with a filename, which prompts web browsers to save the file with the given name instead of displaying it. If the value of the argument is a string, the blob is sent as an attachment, and the string value is used as the filename. If True and blob_key_or_info is a BlobInfo object, the filename from the object is used. By default, the blob data is sent as the body of the response and not as a MIME attachment.
There is a file http://www.example.com/example.avi which is 20MB or even 2GB. I want to fetch example.avi from the internet and store it in the BlobStore. I checked, the urlfetch request size limit is 1MB. I searched and hadnt found a solution.
Thanks a lot!

send_blob() doesn't involve your application reading the file from the API, so the 1MB limit doesn't apply. The frontend service that returns the response to the user will read the entire blob and return all of it in the response (it most likely does the reading in chunks, but this is an implementation detail that you don't have to worry about.
send_blob() sets the content type to either the Blob's internal stored type, or the type you specify with an optional content_type parameter to send_blob(). For the documentation, it seems to need to RTFS; there's a docstring (in the google.appengine.ext.webapp.blobstore_handlers package.)
There's really no great solution for fetching arbitrary files from the web and storing them in Blobstore. Most likely you'd need a service running elsewhere, like your own machine or an EC2 instance, to fetch the files and POST them to a blobstore handler in your application.

Related

Persisting File to App Engine Blobstore

The App Engine documentation for the Blobstore gives a pretty thorough explanation of how to upload a file using the BlobstoreUploadHandler provided by the webapp framework.
However, I have a cgi.FieldStorage instance that I would like to store directly into the Blobstore. In other words, I don't need to upload the file since this is taken care of by other means; I just need to store it.
I've been looking through the blobstore module source to try to understand how the upload handler creates/generates blobstore keys and ultimately writes files to the blobstore itself, but I'm getting lost. It seems like the CreateUploadURLResponse in blobstore_service_pb is where the actual write would occur, but I'm not seeing the component that actually implements that functionality.
Update
There is also an implementation for storing files directly into the filesystem, which I think is what the upload handler does in the end. I am not entirely sure about this, so an explanation as to whether or not using the FileBlobStorage is the correct way to go would be appreciated.
After the deprecation of the files API you can no longer write directly to blobstore.
You should write to Google Cloud Storage instead. For that you can use the AE GCS client
Files written to Google Cloud Storage could be served by the Blobstore API by creating a blob key.

upload image file to datastore using endpoints in python?

I am trying to make a form for adding users that will store user information to my database.
I want to upload an image file using Cloud Endpoints (python). I do not have any idea how to do it.
What will be the input class (request class) and output class (response class)?
#endpoints.method(inputclass, outputclass,
path='anypath', http_method='GET',
name='apiname')"
What url will I provide in the action= form field for uploading the image? How can I then show the file?
You have two ways to store data files (include image file).
The first one is to convert your image in base 64 format, and store it on datastore (Not the best).
The second one, is to store your image file in Google Cloud Storage (That is the best) or the blobstore (By Google themself, there is no good reason to use Blobstore).
So you have to store, your image file to Google Cloud Storage with your endpoint: https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/?hl=fr
Personnaly, I use a servlet (App Engine) to store image in GCS. My endpoints call my servlet and pass image in parameter, and my servlet store image in GCS. It works very well =)
Hope I help you..

How to serve cloudstorage files using app engine SDK

In app engine I can serve cloudstorage files like a pdf using the default bucket of my application:
http://storage.googleapis.com/<appid>.appspot.com/<file_name>
But how can I serve local cloudstorage files in the SDK, without making use of a blob_key?
I write to the default bucket like this:
gcs_file_name = '/%s/%s' % (app_identity.get_default_gcs_bucket_name(), file_name)
with gcs.open(gcs_file_name, 'w') as f:
f.write(data)
The name of the default bucket in the SDK = 'app_default_bucket'
In the SDK datastore I have a Kind: GsFileInfo showing: filename: /app_default_bucket/example.pdf
Update and workaround: You can get a serving url for NON image files like css, js and pdf.
gs_file = '/gs/%s/%s/%s' % (app_identity.get_default_gcs_bucket_name(), folder, filename)
serving_url = images.get_serving_url(blobstore.create_gs_key(gs_file))
UPDATE I found this feature to serve cloudstorage files using the SDK:
This feature has not been documented yet.
http://localhost:8080/_ah/gcs/app_default_bucket/filename
This meands we do not need the img serving url to serve NON images as shown below !!!
To create e serving url for cloudstorage files like images, css, js and pdf's in the default_bucket, I use this code for testing(SDK) and GAE production:
IMPORTANT: the images.get_serving_url() works also for NON images in the SDK!!
In the SDK you stll need the blobstore to read a blob and create a serving url for a cloudstorage object.
I also added the code to read, write and upload cloudstorage blobs in the SDK and GAE production.
The code can be found here.
This is the value that you see in the Development mode from app_identity_stub.py:
APP_DEFAULT_GCS_BUCKET_NAME = 'app_default_bucket'
The comments in this file explain it:
This service behaves the same as the production service, except using
constant values instead of app-specific values
You should get the correct URL in your production code.
EDIT:
This is from the support forum:
In development mode, the app engine tools simulate Google Cloud
Storage services locally. Objects in that simulated environment are
non-persistent so your app is failing because the desired object
doesn't exist in the local store. If you first create (and optionally
write to) the object you're trying to read, it should work fine in dev
mode (it did for me). Of course, objects in the production service are
persistent so there's no need for that extra step when running your
app in production mode (assuming the object already exists).
Hope that helps,
Marc Google Cloud Storage Team
This means you have to write a file first, then you can use it. If I understand correctly, you can use any bucket name for this purpose, including 'app_default_bucket'.
I was here earlier looking for answers and just wanted to share what I found, now that I have it working.
You can do this now, and it's only a little painful. Tricking the image or blobstore API isn't supported and doesn't seem to work any longer.
See:
https://cloud.google.com/storage/docs/access-control/signed-urls
https://cloud.google.com/storage/docs/access-control/create-signed-urls-gsutil
If you sign your URLs, you can give auto-expiring links to your content, for anonymous or paywalled consumption. You wouldn't want to serve your whole site this way, but for a PDF or whatnot, this is a valid and semi-secure option.
Missing from the documentation, you might need to drop the newline for the canonical extended headers. The storage endpoint will tell you what it expects when the signature is bad.
Also, your host should be: https://storage-download.googleapis.com/
If you're using App Engine, then the GoogleAccessId is: <projectname>#appspot.gserviceaccount.com
See: app_identity.get_service_account_name()
Example of how to generate the signature:
from google.appengine.api import app_identity
def signFile(path, verb='GET', md5='', contentType='',
expiration=''):
signatureRequest = '{}\n{}\n{}\n{}\n{}'.format(
verb, md5, contentType, expiration, path)
return app_identity.sign_blob(signatureRequest)
That returns a tuple of (privateKey, binarySignature).
Now you need to construct the URL. The signature should be base64 encoded, then urlencoded. See the following for how to finish constructing the URL. You should probable use the download host mentioned above.
Example URL from the docs:
https://storage.googleapis.
com/example-bucket/cat.jpeg?GoogleAccessId=example#example-project.iam.gservicea
ccount.com&Expires=1458238630&Signature=VVUgfqviDCov%2B%2BKnmVOkwBR2olSbId51kSib
uQeiH8ucGFyOfAVbH5J%2B5V0gDYIioO2dDGH9Fsj6YdwxWv65HE71VEOEsVPuS8CVb%2BVeeIzmEe8z
7X7o1d%2BcWbPEo4exILQbj3ROM3T2OrkNBU9sbHq0mLbDMhiiQZ3xCaiCQdsrMEdYVvAFggPuPq%2FE
QyQZmyJK3ty%2Bmr7kAFW16I9pD11jfBSD1XXjKTJzgd%2FMGSde4Va4J1RtHoX7r5i7YR7Mvf%2Fb17
zlAuGlzVUf%2FzmhLPqtfKinVrcqdlmamMcmLoW8eLG%2B1yYW%2F7tlS2hvqSfCW8eMUUjiHiSWgZLE
VIG4Lw%3D%3D
I hope this helps someone!
Oh yeah, you only need to do all the signature stuff if your bucket isn't publicly accessible (read-all).

Create a large .csv (or any other type!) file with google app engine

I've been struggling to create a file with GAE for two days now, I've examined different approaches and each one seems more complex and time consuming than the previous one.
I've tried simply loading a page and writing the file in to response object with relevant headers:
self.response.headers['Content-Disposition'] = "attachment; filename=titles.csv"
q = MyObject.all()
for obj in q:
title = json.loads(obj.data).get('title')
self.response.out.write(title.encode('utf8')+"\n")
This tells me (in a very long error) that Full proto too large to save, cleared variables. Here's the full error.
I've also checked Cloud Storage, but it needs tons of info and tweaking in the Cloud Console just to get enabled, and Blobstorage which can save stuff only in to the DataStore.
Writing a file can't be this complicated! Please tell me that I am missing something.
That error doesn't have anything to do with writing a CSV, but appears to be a timeout when iterating over all MyObject entities. Remember that requests in GAE are subject to strict limits, and you are probably exceeding those. You probably want to use a cursor and the deferred API to build up your CSV in stages. But for that, you definitely will need to write to the blobstore or CS.

How to serve url to audio in the gae blobstore

I have audio file stored as blobs in google app engine's blobstore. I'm not sure how to get a good url to pass to the client side to play the blob. I would like to do something like the image library.
image.get_serving_url()
But, there is no audio module. So, is there a good way to get a url from a blob to play audio or even better any media?
The rendering of an image is done by the browser. It's the same for audio, the browser decides what to do with a resource you point it to. For that, you need to add the correct mime type[1] header. If the file already had the correct mime type set when being uploaded you don't need to do this manually.
As for serving the blob, you need to create a blobstore download handler:
http://code.google.com/appengine/docs/python/tools/webapp/blobstorehandlers.html#BlobstoreDownloadHandler
[1] http://en.wikipedia.org/wiki/Internet_media_type
I think what you're looking for is something like how S3 works, where the blobs you upload are automatically given a URL that can then be dropped directly in to the browser. Blobstore was designed to primarily give developers control over their URLs and fine grained control over access to the blobs. It does not have the facility to simply provide a URL based on, say, the blob reference. I think schuppe's answer is correct in describing what you need to do.
If you are interested in simply serving a blob to a user without any kind of authentication or restriction, it's not that hard to write a handler. The one that is in the documentation that schuppe referred you to will work ok, however, be careful, because it could open your app up to certain types of DOS attacks. Also, if you do it as the documentation does it, anyone who has one of your blob-reference strings can access any blob throughout your whole application, whether you mean to or not. Therefore you should build some additional access control around it.
Of course, if you're not concerned with controlling access to the data, that solutions is simple and will work fine.

Categories