How to do chunk deduplication-based upload in Dropbox API? - python

I am using Dropbox API (python version), and want to replicate one functionality in Dropbox client-side software.
In Dropbox API, I can call a function like put_file() to upload the file to my Dropbox account.
Dropbox actually implemented per-user deduplication mechanism, which means that you need to transmit the chunk/file hash to the server before transmitting the chunk/file to the server.
If you uploaded a file F before, if now the server finds a hash match, you don't need to transmit the chunk/file again.
put_file() seems to upload the file everytime and does not do the chunking.
I also found upload_chunk() probably useful, but it seems not that useful.
I am wondering how can I do the chunk-based deduplication with Dropbox API?
(for example, I can upload the hash of a particular chunk, and the server will reply me whether there is a hash match)

According to this announcement the purpose of chunked upload is to make it possible to deal with spotty connections by letting you upload a large file in chunks instead. It's not about deduplication.
If you take a look through the Core API documentation (not that much to read, really), there is no mention anywhere of de-duplication being offered through the API. Wether you use Python or any other language or library, without the published API supporting de-duplication, there is no way you can access this functionality.

Related

Best way to use python library in Firebase project

I found a python package that I want to use within my angular based firebase project (it does some complex analysation of a text file).
What is the best way to use it? I see the following options:
Own docker container with flask, running in Cloud run (e.g. like that) - pass file in AJAX request, return JSON result.
Downsides: own endpoint that has to be noted down somewhere in main project, own repository: not in the other Node.js cloud functions
Call python script in Node.js cloud function (like this)
a little bit hacky piping file and log outputs as strings, probably not easy to get all the python dependencies working (would this work at all?)
Totally independent microservice just taking the file, analysing it and sending back JSON. Maybe as AWS lambda?
again "cut out" of the main project
I'd love to have a "clean and easy" integration in my existing Node.js cloud functions that I use in firebase. The firebase CLI can then take over all the URL endpoint handling etc. But I don't see a way to do this.
A kind of better capsulated approach would be to go with 1. or 3. and have a Node.js cloud function calling the endpoint. With that I would also not have the client code call the endpoint and have better possibilites in configuring it without having to update the client code.
Am I missing an approach? What would be the best way to do this?
Use Case: User uploads file, the file and some other values are saved to his account. The content of the file should be analysed (can be done asynchronously) and the results should be available for the user to be shown.
I went with option 1 and it is working quite well so far. It's very easy to deploy the dockerized flask server on Cloud Run. Also it's possible to control access and authentication from the firebase function to the Cloud Run container with Google's IAM.

Make the Google python client library for accessing Google cloud storage hit a stubbed API

I am writing an application that uses Google's python client for GCS.
https://cloud.google.com/storage/docs/reference/libraries#client-libraries-install-python
I've had no issues using this, until I needed to write my functional tests.
The way our organization tests integrations like this is to write a simple stub of the API endpoints I hit, and point the Google client library (in this case) to my stub, instead of needing to hit Google's live endpoints.
I'm using a service account for authentication and am able to point the client at my stub when fetching a token because it gets that value from the service account's json key that you get when you create the service account.
What I don't seem able to do is point the client library at my stubbed API instead of making calls directly to Google.
Some work arounds that I've though of, that I don't like are:
- Allow the tests to hit the live endpoints.
- Put in some configuration that toggles using the real Google client library, or a mocked version of the library. I'd rather mock the API versus having mock code deployed to production.
Any help with this is greatly appreciated.
I’ve made some research and it seems like there’s nothing supported specifically for Cloud Storage using python. I found this GitHub issue entry with a related discussion, but for go.
I think you can open a public issue tracker asking for this functionality. I’m afraid by now it’s easier to keep using your second workaround.

What credentials should I use to allow the users to upload directly to GCS?

I am making an application in GAE (python) which allows users (who must be logged in with a Google Account) to upload files to Google Cloud Storage directly. The upload would happen asynchronously (a button would execute some javascript code to send the XHR PUT request with the file).
I decided to use the JSON API, and I'm trying to do a multipart upload. According to the docs I need to include an Authorization header with a token. I was wondering, since I don't need access to anything in the user's account (like calendar or that kind of thing), what would be the best way to achieve this?
Is it possible to create a temporary access token from the application default credentials and send that to the user within the html file to use it then from the js function? If not, what should I do?
You would need to ask the user to grant you the google cloud storage write scopes (which I would strongly recommend any google cloud storage user not grant to any random application). You would also need to grant the end users write permission on your bucket (which also means they can delete everything in the bucket). Unless you know all of your end users up front, this means you would need to make the bucket public write.. which I doubt is something you want to do.
I strongly recommend using delegating access to your service account instead, though unfortunately the JSON api does not currently support any form of authentication delegation. However, the XML API does support Signed URLs.
https://cloud.google.com/storage/docs/access-control/signed-urls
You only need to use it for the client-side uploads, everything else can use the JSON api.
There are three options:
Just sign a simple PUT request
https://cloud.google.com/storage/docs/xml-api/put-object-upload#query_string_parameters
Use a form POST and sign a policy document
https://cloud.google.com/storage/docs/xml-api/post-object#policydocument
Initiate a resumable upload server side and pass the upload URL back to the client. I would only recommend this option if being able to resume the upload is important (e.g. large uploads).
https://cloud.google.com/storage/docs/access-control/signed-urls#signing-resumable

Can I upload a file to GCS from Google Endpoints?

I'm trying to upload a file from API Rest (Google Endpoints) to GCS, but I have retrieve a lot of errors. I don't know if I'm using a bad way or simply Google Endpoints does not upload a file.
I'm trying who my customers upload files to my project bucket.
I read "Endpoints doesn't accept the multipart/form-data encoding so you can't upload the image directly to Endpoints".
Mike answered me at this post but dont know how to implement that on my project.
I'm using this libray (Python):
https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/
If is possible, whats the better way? Any example?
Thanks so much.
I think what Mike means in the previous post, is that you should use Blobstore API to upload file to GCS, instead of using endpoints, and take the data again to the blobstore.
But that will depends on what platform is your client. If you use Web-based client, you should use ordinary way just as Mike has explained (by using HTML form and handler). But if you use Android or mobile client, you can use GCS Client library or GCS REST API.

Google App Engine: upload file and other fields in the same request

I want to upload an image to the blobstore, because I want to support files larger than 1MB. Now, the only way I can find is for the client to issue a POST where it sends the metadata, like geo-location, tags, and what-not, which the server puts in an entity. In this entity the server also puts the key of a blob where the actual image data is going to be stored, and the server concludes the request by returning to the client the url returned by create_upload_url(). This works fine, however I can get inconsistency, such as if the second request is never issued, and hence the blob is never filled. The entity is now pointing to an empty blob.
The only solution to this problem I can see is to trigger a deferred task which is going to check whether the blob was ever filled with an upload. I'm not a big fan of this solution, so I'm guessing if anybody has a better solution in mind.
I went through exactly the same thought process, but in Java, and ended up using Apache Commons FileUpload. I'm not familiar with Python, but you'll just need a way of handling a multipart/form-data upload.
I upload the image and my additional fields together, using JQuery to assemble the multipart form data, which I then POST to my server.
On the server side I then take the file and write it to Google Cloud Storage using the Google Cloud Storage client library (Python link). This can be done in one chunk, or 'streamed' if it's a large file. Once it's in GCS, your App Engine app can read it using the same library, or you can serve it directly with a public URL, depending on the ACL you set.

Categories