Get file size from Google Drive Api (python)

Get file size from Google Drive Api (python) - python

I'm trying to figure out how to retrieve the file size of a file uploaded to Google Drive. According to the docs this should be in the file metadata... but when I request it, file size is not in the metadata at all.
file = self.drive_service.files().get(fileId=file_id).execute()
print(file)
>>> {u'mimeType': u'application/x-zip', u'kind': u'drive#file', u'id': u'0B3JGbAfem1CrWnhtWq5qYlkzSXf', u'name': u'myfile.ipa'}
What am I missing here? How can I check the file size?

Per default, only a few select attributes are included in the metadata.
To request specific attributes, use the fields parameter:
file = self.drive_service.files().get(fileId=file_id, fields='size,modifiedTime').execute()
This would query a file's size and modification time.
By the way, the link you posted refers to the old v2 API. You can find a list of all file attributes in the current v3 API here.

You are missing the 'fields' special query parameter here. It is used for giving partial response for google apis. The partial response is used mainly to improve api call performance.
There is a slight change in the newly introduced v3 apis, the file list apis response give some default attributes in the response, unlike the v2 apis, which give all the attributes in response by default.
Although, if you want all the attributes in the response, pass ' fields=* ' as query.
Hope, this helps!

Passing 'size' within fields should work.
Example: fields='size'
Although if the file is native to Google Drive such as a file made within Google Docs or Sheets those files do not take up space against your quota and thus don't have a size.

Related

How to modify a Google Cloud Storage object using Python client library?

I'm trying to set up a cloud function that performs user authentication. I need it to open an object stored in a Cloud Storage bucket, read its content and verify if username and password match those coming from the HTTP request.
In some cases the function needs to add a user: it should retrieve the content of the .json file stored in the bucket, add a username:password pair and save the content in the same object.
Basically, it has to modify the content of the object.
I can't find the way o do it using the Cloud Storage Python client library. None of the tutorials listed in the GitHub pages mentions anything like "modify a file" or similar concepts (at least in their short descriptions).
I also looked for a method to perform this operation in the Blob class source code, but I couldn't find it.
Am I missing something? This looks to me as a very common operation, one that should have a very straightforward method, like blob.modify(new_content).
I have to confess that I am completely new to GCP, so there is probably an obvious reason behind this (or maybe I just missed it).
Thank you in advance!

Cloud Storage is a blob storage and you can only read, write and delete the object. You can't update the content (only the metadata) and can't move/rename a file (move and rename operation perform a copy (create a new object) followed by a delete (of the old object)).
In addition, the directories don't exist, all the file are put at the root level of the bucket. The file name contains the path from the root to the leaf. The / is only a human representation for the folders (and the UI use that representation), but the directories are only virtual.
Finally, you can't search on a file suffix, only per prefix of the file name (including the full path from the root path /)
In summary, it's not a file system, it's a blob storage. Change your design or your file storage option.

Google Drive API just listing up to 100 files per folder

I'm creating an application to backup files stored in Google Drive using Python. The fist problem I encountered is that for some reason it just list up to 100 files in every folder. I think I'm using the correct scope ('https://www.googleapis.com/auth/drive') and here is the line of code I'm using to list the contents of a folder
folder_contents = service.files().list( q="'%s' in parents" % folder['id'] ).execute()
Is there any way to change the number of listed files? I found in this thread Google Drive List API - not listed all the documents? that there is a way to do this but NumberToRetrieve does not work. I tried to test it in the Drive API reference webpage (https://developers.google.com/drive/v2/reference/files/list) but it gives me an error. I also can't figure out how to put this parameter when I request the list fromm my code (I'm new using the Drive API).
Thanks in advance.
EDIT: I found a way to increase the number of files using the MaxResuls flag. But it just allows to list up to 1000 and that's not quite enough.

It's pretty standard for all REST APIs to paginate their output lists. The Drive API is no different. You simply need to monitor the presence of a nextPageToken in the JSON response and recurse through the successive pages.
NB. Don't make the mistake as some have of assuming that a response with less than MaxResults is the last page. You should rely solely on nextPageToken.
As an aside, once you have pagination working in your code, consider setting maxResults back to its default. I have a suspicion that the non-default values are less well tested, and also more prone to timeouts.

"results = service.files().list(
pageSize = 1000, q="'xxx' in parents",fields="nextPageToken, files(id, name)").execute()"
This worked for me.

Default max results of query is 100. Must use pageToken/nextPageToken to repeat it. see Python Google Drive API - list the entire drive file tree

evernote updating note resources

I'm using the Evernote API for Python to create an app that allows the user to create and update notes, but I'm having trouble understanding how to efficiently update Evernote resources. This mainly occurs when I'm converting from HTML to ENML (Evernote Markup Language), where I'm creating resources from img tags (right now I'm only considering image resources).
My question is this: how can I tell, given HTML, if a note's resources needs to be updated? I've considered comparing the image data to all of the current resources' data, but that seems really slow. Right now I just make a new resource for each img tag.
Some helpful resources I've found include the Evernote resources guide and this sample code in the Evernote SDK. Any advice is appreciated.

The best way would be a comparison of the MD5 hash of the file. Evernote notes track resources by their MD5 hash.
To see the MD5 hash of the file attached to an Evernote note, just look at the ENML elements labeled "en-media", the form of the tags can be seen below:
<en-media type="mime-type" hash="md5-of-file" />
Where mime-type is the file type and md5-of-file is the MD5 hash of the file. To get the ENML of a note call getNote (documentation here) and make sure to specify you want the contents. The ENML contents of the note is the value of the content attribute of the object that is returned by getNote (a note object).
While hashes can be expensive MD5 is relatively quick and it will be quicker to compute the MD5 hash of a file than it will be to wait for the network to download images.
Also, the updateResource method documentation says:
"Submit a set of changes to a resource to the service. This can be
used to update the meta-data about the resource, but cannot be used to
change the binary contents of the resource (including the length and
hash). These cannot be changed directly without creating a new
resource and removing the old one via updateNote."
So the only way to "update" a resource is to remove the old resource from the note and create a new one in its place. You can do this by removing the resource by remove the Resource Object from the list contained in the resources attribute of the note in question. To add a new note simple add a new resource object to the same list.

Create a large .csv (or any other type!) file with google app engine

I've been struggling to create a file with GAE for two days now, I've examined different approaches and each one seems more complex and time consuming than the previous one.
I've tried simply loading a page and writing the file in to response object with relevant headers:
self.response.headers['Content-Disposition'] = "attachment; filename=titles.csv"
q = MyObject.all()
for obj in q:
title = json.loads(obj.data).get('title')
self.response.out.write(title.encode('utf8')+"\n")
This tells me (in a very long error) that Full proto too large to save, cleared variables. Here's the full error.
I've also checked Cloud Storage, but it needs tons of info and tweaking in the Cloud Console just to get enabled, and Blobstorage which can save stuff only in to the DataStore.
Writing a file can't be this complicated! Please tell me that I am missing something.

That error doesn't have anything to do with writing a CSV, but appears to be a timeout when iterating over all MyObject entities. Remember that requests in GAE are subject to strict limits, and you are probably exceeding those. You probably want to use a cursor and the deferred API to build up your CSV in stages. But for that, you definitely will need to write to the blobstore or CS.

Some Google App Engine BlobStore Problems

I have googled and read the docs on Google App Engine official site about BlobStore but there are some problems that I still dont understand. My Platform is webapp.
Docs I have read:
webapp Blobstore Handlers
Blobstore Python API Overview
The Blobstore Python API
After reading all these docs, I still have some problems:
In Blobstore Python API Overview it says: maximum size of Blobstore data that can be read by the app with one API call is 1MB. What does this mean? Does this 1MB limit apply to sendblob()? Take the following code from webapp Blobstore Handlers as an example:
class ViewPhotoHandler(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, photo_key):
self.send_blob(photo_key)
Does that mean the photo ( which is uploaded and stored in the blobstore )associated with the photo_key must be less than 1MB? From the context, I dont think so. I think the the photo can be as large as 2GB. But I am not sure.
How is the ContentType determined on send_blob()? Is it text/html or image/jpeg? Can I set somewhere it myself? The following explanation from webapp Blobstore Handlers is so confusing. Quite difficult for a non-english speaker. Can someone paraphrase it with code samples? Where is the docs for send_blob()? I cant find it.
The send_blob() method accepts a save_as argument that determines whether the blob data is sent as raw response data or as a MIME attachment with a filename, which prompts web browsers to save the file with the given name instead of displaying it. If the value of the argument is a string, the blob is sent as an attachment, and the string value is used as the filename. If True and blob_key_or_info is a BlobInfo object, the filename from the object is used. By default, the blob data is sent as the body of the response and not as a MIME attachment.
There is a file http://www.example.com/example.avi which is 20MB or even 2GB. I want to fetch example.avi from the internet and store it in the BlobStore. I checked, the urlfetch request size limit is 1MB. I searched and hadnt found a solution.
Thanks a lot!

send_blob() doesn't involve your application reading the file from the API, so the 1MB limit doesn't apply. The frontend service that returns the response to the user will read the entire blob and return all of it in the response (it most likely does the reading in chunks, but this is an implementation detail that you don't have to worry about.
send_blob() sets the content type to either the Blob's internal stored type, or the type you specify with an optional content_type parameter to send_blob(). For the documentation, it seems to need to RTFS; there's a docstring (in the google.appengine.ext.webapp.blobstore_handlers package.)
There's really no great solution for fetching arbitrary files from the web and storing them in Blobstore. Most likely you'd need a service running elsewhere, like your own machine or an EC2 instance, to fetch the files and POST them to a blobstore handler in your application.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.