Google App Engine: upload file and other fields in the same request - python

I want to upload an image to the blobstore, because I want to support files larger than 1MB. Now, the only way I can find is for the client to issue a POST where it sends the metadata, like geo-location, tags, and what-not, which the server puts in an entity. In this entity the server also puts the key of a blob where the actual image data is going to be stored, and the server concludes the request by returning to the client the url returned by create_upload_url(). This works fine, however I can get inconsistency, such as if the second request is never issued, and hence the blob is never filled. The entity is now pointing to an empty blob.
The only solution to this problem I can see is to trigger a deferred task which is going to check whether the blob was ever filled with an upload. I'm not a big fan of this solution, so I'm guessing if anybody has a better solution in mind.

I went through exactly the same thought process, but in Java, and ended up using Apache Commons FileUpload. I'm not familiar with Python, but you'll just need a way of handling a multipart/form-data upload.
I upload the image and my additional fields together, using JQuery to assemble the multipart form data, which I then POST to my server.
On the server side I then take the file and write it to Google Cloud Storage using the Google Cloud Storage client library (Python link). This can be done in one chunk, or 'streamed' if it's a large file. Once it's in GCS, your App Engine app can read it using the same library, or you can serve it directly with a public URL, depending on the ACL you set.

Related

What credentials should I use to allow the users to upload directly to GCS?

I am making an application in GAE (python) which allows users (who must be logged in with a Google Account) to upload files to Google Cloud Storage directly. The upload would happen asynchronously (a button would execute some javascript code to send the XHR PUT request with the file).
I decided to use the JSON API, and I'm trying to do a multipart upload. According to the docs I need to include an Authorization header with a token. I was wondering, since I don't need access to anything in the user's account (like calendar or that kind of thing), what would be the best way to achieve this?
Is it possible to create a temporary access token from the application default credentials and send that to the user within the html file to use it then from the js function? If not, what should I do?
You would need to ask the user to grant you the google cloud storage write scopes (which I would strongly recommend any google cloud storage user not grant to any random application). You would also need to grant the end users write permission on your bucket (which also means they can delete everything in the bucket). Unless you know all of your end users up front, this means you would need to make the bucket public write.. which I doubt is something you want to do.
I strongly recommend using delegating access to your service account instead, though unfortunately the JSON api does not currently support any form of authentication delegation. However, the XML API does support Signed URLs.
https://cloud.google.com/storage/docs/access-control/signed-urls
You only need to use it for the client-side uploads, everything else can use the JSON api.
There are three options:
Just sign a simple PUT request
https://cloud.google.com/storage/docs/xml-api/put-object-upload#query_string_parameters
Use a form POST and sign a policy document
https://cloud.google.com/storage/docs/xml-api/post-object#policydocument
Initiate a resumable upload server side and pass the upload URL back to the client. I would only recommend this option if being able to resume the upload is important (e.g. large uploads).
https://cloud.google.com/storage/docs/access-control/signed-urls#signing-resumable

Securely transfer a banch of files from one Heroku app to another

I need to setup a Heroku app (python) which would perform scheduled tasks that include fetching a set of data (.csv an .html) files from other Heroku app (ROR) and returning a result back to that app.
Also it should be restricted only to my app to be able to connect to the ROR app because it deals with sensitive information. There would be from 20 to 100 files each time so I want them to be compressed somehow to transfer them quiclky (to avoid bothering the server for too long).
I'm interested in possible ways to accomplish it. The first thought is to send HTTP GET request to the ROR app and fetch the necessary files yet it generally not secured at all. Would SCP work in some way in this situation or you have any other ideas?
Thanks in advance!
I would suggest writing a secured JSON or XML API to transfer the data from app to app,. Once the data is received I would then generate the .csv or .html files from the received data. It keeps things clean and easy to modify for future revisions because now you'll have an API to interact with.

Can I upload a file to GCS from Google Endpoints?

I'm trying to upload a file from API Rest (Google Endpoints) to GCS, but I have retrieve a lot of errors. I don't know if I'm using a bad way or simply Google Endpoints does not upload a file.
I'm trying who my customers upload files to my project bucket.
I read "Endpoints doesn't accept the multipart/form-data encoding so you can't upload the image directly to Endpoints".
Mike answered me at this post but dont know how to implement that on my project.
I'm using this libray (Python):
https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/
If is possible, whats the better way? Any example?
Thanks so much.
I think what Mike means in the previous post, is that you should use Blobstore API to upload file to GCS, instead of using endpoints, and take the data again to the blobstore.
But that will depends on what platform is your client. If you use Web-based client, you should use ordinary way just as Mike has explained (by using HTML form and handler). But if you use Android or mobile client, you can use GCS Client library or GCS REST API.

How to do chunk deduplication-based upload in Dropbox API?

I am using Dropbox API (python version), and want to replicate one functionality in Dropbox client-side software.
In Dropbox API, I can call a function like put_file() to upload the file to my Dropbox account.
Dropbox actually implemented per-user deduplication mechanism, which means that you need to transmit the chunk/file hash to the server before transmitting the chunk/file to the server.
If you uploaded a file F before, if now the server finds a hash match, you don't need to transmit the chunk/file again.
put_file() seems to upload the file everytime and does not do the chunking.
I also found upload_chunk() probably useful, but it seems not that useful.
I am wondering how can I do the chunk-based deduplication with Dropbox API?
(for example, I can upload the hash of a particular chunk, and the server will reply me whether there is a hash match)
According to this announcement the purpose of chunked upload is to make it possible to deal with spotty connections by letting you upload a large file in chunks instead. It's not about deduplication.
If you take a look through the Core API documentation (not that much to read, really), there is no mention anywhere of de-duplication being offered through the API. Wether you use Python or any other language or library, without the published API supporting de-duplication, there is no way you can access this functionality.

Uploading to Blobstore without using blobstore.create_upload_url

I would like to create an app using python on google app engine to handle file upload and store them into the blobstore.
However, currently blobstore requires the use of blobstore.create_upload_url to create the url for the file upload form post. Since I am uploading from another server, my question is, is it possible to upload file to gae blobstore without using that dynamic url from blobstore.create_upload_url?
FYI, it is ok that I request a upload URL from the python script before I upload from another server but this creates extra latency and that is not what I want. Also I read that using the so called "file-like API" from http://code.google.com/intl/en/appengine/docs/python/blobstore/overview.html#Writing_Files_to_the_Blobstore but the documentation didn't seem to cover the part on uploading.
Also, previously I tried to use datastore for file upload, but the max file size is 1MB which is not enough for my case. Please kindly advise, thanks.
There are exactly two ways to write files to the blobstore: 1) using create_upload_url, and posting a form with a file attachment to it, 2) writing to the blobstore directly using an API (with a solution here for large files).
If you want a remote server to be able to upload, you have the same two choices:
1) The remote server firsts requests a URL. You have a handler that's just is only to return such a URL. With this URL, the remote server crafts a properly-encoded POST message and posts it to the URL.
2) You send the file data as an HTTP parameter to a handler to your app engine server, which uses the blobstore write API to then write the file directly. Request size is limited to 32MB, so the maximum file size you can write using this method will be slightly under that.

Categories