Google Drive API search for files only in My Drive - python

I am trying to retrieve all files in Google Drive, but only those in 'My Drive'. I tried including "'me' in owners" in the query, but that gives me tons of files in shared folders where I am the owner. I tried "'root' in parents" in the query, but that gives me back only files directly under My Drive, while I need also files under subfolders and subolders of those subolders, etc.
I tried also setting the drive parameter but in this case the query does not retrieve anything at all:
driveid = service.files().get(fileId='root').execute()['id']
page_token = None
my_files = list()
while True:
results = service.files().list(q= "'myemail#gmail.com' in owners",
pageSize=10,
orderBy='modifiedTime',
pageToken=page_token,
spaces = 'drive',
corpora='drive',
driveId = driveid,
includeItemsFromAllDrives=True,
supportsAllDrives=True,
fields="nextPageToken, files(id, name)").execute()
items = results.get('files', [])
my_files.extend(items)
page_token = results.get('nextPageToken', None)
if page_token is None:
break
print(len(my_files))
# This prints: 0
How can I get this to work?
I guess the other possibility would be to start from root, get children and recursively navigate the full tree, but that is going to be very slow. The same applies if I get all the files and then find out all the parents to check if they are in My Drive or not, I have too many files and that takes hours.
Thanks in advance!

The first request you make would be to parents in root. This is the top level of your drive account.
results = service.files().list(q= "root in parents").execute()
Now you will need to loop though the results here in your code. Check for mime type being a directory 'application/vnd.google-apps.folder' Everything that is not a directory should be a file sitting in the root directory of your Google drive account.
Now all those directories that you found what you can do is make a new request to find out the files in those directories
results = service.files().list(q= "directorIDFromLastRequest in parents").execute()
You can then loop though getting all of the files in each of the directories. Looks like its a known bug Drive.Files.list query throws error when using "sharedWithMe = false"
shared with me
You can also set SharedWithMe = false in the q parameter and this should remove all of the files that have been shared with you. Causing it to only return the files that are actually yours.
This used to work but i am currently having issues with it while i am testing.
Speed.
The thing is as mentioned files.list will by default just return everything but in no order so technically you could just do a file.list and add the sharedwithme and get back all the files and directories on your drive account. By requesting pagesize of 1000 you will then have fewer requests. Then sort it all locally on your machine once its down.
The other option would be to do as i have written above and grab each directory in turn. This will probably result in more requests.

Possible fix here using google drive API v3 with python 3.7+
use the following syntax:
q="mimeType='application/vnd.google-apps.folder' and trashed = false and 'me' in owners"
This query passed into service.files().list method should get you what you need. A list of all folders owned by you which is the best workaround I could find. " 'me' in owners" is the key here.
Full snippet here:
response = service.files().list(q="mimeType='application/vnd.google-apps.folder' and trashed = false and 'me' in owners",
spaces='drive',
fields='nextPageToken, files(id, name)',
pageToken=page_token).execute()
for file in response.get('files', []):
# Process change
print ('Found file: %s (%s)' % (file.get('name'), file.get('id')))

Related

Is it always correct to use URLs like "./about.html" or "../about.htm" instead of Absolute URLS like /about?

I'm a computer science student. Recently we were tasked to develop a static HTTP server from scratch without using any HTTP modules, solely depending on socket programming. So this means that I had to write all the logic for HTTP message parsing, extracting headers, parsing URLs, etc.
However, I'm stuck with some confusion. As I'm somewhat experienced in web development before, I'm used to using URLs in places like anchor tags like this "/about", and "/articles/article-1".However, I've seen people sometimes people to relative paths according to their folder structure like this. "./about.html", "../contact.html".This always seemed to be a bad idea to me. However, I realized that even though in my code I'm not supporting these kinds of URLs explicitly, it seems to work anyhow.
Following is the python code I'm using to get the path from the HTTP message and then get the corresponding path in the file system.
def get_http_url(self, raw_request_headers: list[str]):
"""
Method to get HTTP url by parsing request headers
"""
if len(raw_request_headers) > 0:
method_and_path_header = raw_request_headers[0]
method_and_path_header_segments = method_and_path_header.split(" ")
if len(method_and_path_header_segments) >= 2:
"""
example: GET / HTTP/1.1 => ['GET', '/', 'HTTP/1.1] => '/'
"""
url = method_and_path_header_segments[1]
return url
return False
def get_resource_path_for_url(self, path: str | Literal[False]):
"""
Method to get the resource path based on url
"""
if not path:
return False
else:
if path.endswith('/'):
# Removing trailing '/' to make it easy to parse the url
path = path[0:-1]
# Split to see if the url also includes the file extension
parts = path.split('.')
if path == '':
# if the requested path is "/"
path_to_resource = os.path.join(
os.getcwd(), "htdocs", "index.html")
else:
# Assumes the user entered a valid url with resources file extension as well, ex: http://localhost:2728/pages/about.html
if len(parts) > 1:
path_to_resource = os.path.join(
os.getcwd(), "htdocs", path[1:]) # Get the abslute path with the existing file extension
else:
# Assumes user requested a url without an extension and as such is hoping for a html response
path_to_resource = os.path.join(
os.getcwd(), "htdocs", f"{path[1:]}.html") # Get the absolute path to the corresponding html file
return path_to_resource
So in my code, I'm not explicitly adding any logic to handle that kind of relative path. But somehow, when I use things like ../about.html in my test HTML files, it somehow works?
Is this the expected behavior? As of now (I would like to know where this behavior is implemented), I'm on Windows if that matters. And if this is expected, can I depend on this behavior and conclude that it's safe to refer to HTML files and other assets with relative paths like this on my web server?
Thanks in advance for any help, and I apologize if my question is not clear or well-formed.

Python MSAL REST Graph: is it possible to get all files in folder, not just 200, in one request?

I need to delete all files from a OneDrive folder. When I issue a request like the one shown under # Listing children(all files and folders) within General here, I only get 200 items, which seems to be the expected default behavior per this. So in order to delete all files, I repeat the request multiple times, each time deleting 200 files. Is it possible to get ALL children in one request?
This worked:
link = parent+":/children"
while True:
rGetCh = requests.get(link, headers=headers)
for ch in rGetCh.json()["value"]:
# Looping through the current list of children
chName = urllib.parse.quote(ch["name"].encode('utf8'))
chPath = parent + "/" + chName
if "#odata.nextLink" in rGetCh.json(): # if this is in the response, there are more children
link = rGetCh.json()["#odata.nextLink"]
else:
break

(Python) Google Drive API: get mime type of file from id in a non-awkward way

I need to download Google Drive files that are attached to Google Classroom submissions.
From the Google Class "Submission" I get some information that does not include the mime type:
{
"driveFile":{
"id":"10NDXAyz8gkWzkXPS[removed]",
"title":"git bash.PNG",
"alternateLink":"https://drive.google.com/file/d/10NDXAyz8gkWzkX[removed]/view?usp=drive_web",
"thumbnailUrl":"https://lh3.googleusercontent.com/nf4i[removed]=s200"
}
}
If I understand correctly the mime type is needed to know which one is the right method to invoke for download, ie.
service.files().export_media(fileId=file_id, mimeType=export_mime_type
versus
service.files().get_media(fileId=file_id))
So far I have found only a very awkward way to get the mime type (code at the bottom).
This is to ask if there is, using the API, a less awkward way, I searched but cannot find it.
In my code splitting the ...thing in 2 functions is intentional, the awkwardness is in having to query by name, and them check the id.
I wonder if there is some more appropriate "method" than files.list() to invoke.
If there isn't whether it is possible to query by id with it. I did not find that in the doc, tried it anyway but it did not work
def get_file_from_id_name(file_id, file_name, gdrive_service):
"""it seems I can only query by name, not by id"""
page_token = None
while True:
query = "name='{}'".format(file_name)
response = gdrive_service.files().list(spaces='drive', q = query,
fields='nextPageToken, files({})'.format(ALL_FILE_FIELDS),
pageToken=page_token).execute()
for file in response.get('files', []):
if file.get('id') == file_id:
return file
page_token = response.get('nextPageToken', None)
if page_token is None:
break
log.info("breakpoint")
return None
def get_mime_type_from_id_name(file_id, file_name, gdrive_service):
file_d = get_file_from_id_name(file_id, file_name, gdrive_service)
mime_type = file_d.get("mimeType")
return mime_type
Sorry for the long detailed question, tried to make it as concise as possible
I have seen that in Python you can use mimetypes.guess_type to retrieve the mimetype of a file. You can check this example so you can have an idea. This is how it was done in that scenario:
mimetype = MimeTypes().guess_type(name)[0]
After some further unfruitful research it seems that there isn't a way to get the mime type from the API.
This inference is also supported by the fact that the API does not cover all the feature and data available, see for instance the current lack of API support for a very important feature like private comments (currently approved but not yet available)

Google Sheets Find Previously Created Sheet Using Name

My use-case is to use a script to create/update a sheet on my google drive and have it run everyday so the data is correct.
My code properly creates the sheet, but when I run each day it creates a different sheet with the same name. I want to add a try, except to see if the sheet was previously, and if it is, just overwrite.
I've spent a couple of hours trying to find an example where someone did this. I'm looking to return the sheetid, whether it's newly created or previously created.
def create_spreadsheet(sp_name, creds):
proxy = None
#Connect to sheet API
sheets_service = build('sheets', 'v4', http=creds.authorize(httplib2.Http(proxy_info = proxy)))
#create spreadsheet with title 'sp_title'
sp_title = sp_name
spreadsheet_req_body = {
'properties': {
'title': sp_title
}
}
spreadsheet = sheets_service.spreadsheets().create(body=spreadsheet_req_body,
fields='spreadsheetId').execute()
return spreadsheet.get('spreadsheetId')
You want to check whether the file (Spreadsheet), which has the specific filename, is existing in your Google Drive.
If the file is existing, you want to return the file ID of it.
If the file is not existing, you want to return the file ID by creating new Spreadsheet.
You want to achieve above using google-api-python-client of Python.
If my understanding is correct, how about this modification? There is the method for confirming whether the file, which has the specific filename, is existing using Drive API. In this modification, the method of Files: list Drive API is used. Please think of this as just one of several answers.
Modification points:
In this modification, the method of Files: list Drive API is used. The file is checked with the search query.
In this case, the file is searched by the filename and the mimeType and out of the trash box.
When the file is existing, the file ID is return.
When the file is NOT existing, new Spreadsheet is created and return the file ID by your script.
Modified script:
Please modify your script as follows.
def create_spreadsheet(sp_name, creds):
proxy = None
sp_title = sp_name
# --- I added blow script.
drive_service = build('drive', 'v3', http=creds.authorize(httplib2.Http(proxy_info = proxy)))
q = "name='%s' and mimeType='application/vnd.google-apps.spreadsheet' and trashed=false" % sp_title
files = drive_service.files().list(q=q).execute()
f = files.get('files')
if f:
return f[0]['id']
# ---
sheets_service = build('sheets', 'v4', http=creds.authorize(httplib2.Http(proxy_info = proxy)))
sp_title = sp_name
spreadsheet_req_body = {
'properties': {
'title': sp_title
}
}
spreadsheet = sheets_service.spreadsheets().create(body=spreadsheet_req_body,
fields='spreadsheetId').execute()
return spreadsheet.get('spreadsheetId')
Note:
In this modification, I used https://www.googleapis.com/auth/drive.metadata.readonly as the scope. So please enable Drive API and add the scope and delete the file including the access token and refresh token, then please authorize the scopes by running the script, again. By this, the additional scope can be reflected to the access token. Please be careful this.
Reference:
Files: list of Drive API
If I misunderstood your question and this was not the direction you want, I apologize.

Downloading via boto

I am using boto client to download and upload my files to s3 and do a whole bunch of other things like copy from one folder key to another and etc. The problem arises when I try to copy a key whose size is 0 bytes. The code that I use to copy is below
# Get the connection to the bucket
conn = boto.connect_s3(AWS_KEY, SECRET_KEY)
bucket = conn.get_bucket('mybucket')
# bucket.name is the name of my bucket
# candidate is the source key
destination_key = "destination/path/on/s3"
candidate = "the/file/to/copy"
# now copy the key
bucket.copy_key(destination_key, bucket.name, candidate) # --> This throws an exception
# just in case, see if the key ended up in the destination.
copied_key = bucket.lookup(destination_key)
The exception that I get is
3ResponseError: 404 Not Found
<Error><Code>NoSuchKey</Code>
<Message>The specified key does not exist.</Message>
<Key>the/file/to/copy</Key><RequestId>ABC123</RequestId><HostId>XYZ123</HostId>
</Error>
Now I have verified that the key infact exists by logging into the aws console and navigating to the source key location, the key is there and the aws console shows that its size is 0 (there are cases in my application that I may end up with empty files but I need them on s3).
So upload works fine, boto uploads the key without any issue, but when I attempt to copy it, then I get the error that the key does not exist
So is there any other logic that I should be using to copy such keys? Any help in this regard would be appreciated
Make sure you include the bucket of the source key. Should be something like bucket/path/to/file/to/copy
Try this:
from boto.s3.key import Key
download_path = '/tmp/dest_test.jpg'
bucket_key = Key(bucket)
bucket_key.key = file_key # e.g. images/source_test.jpg
bucket_key.get_contents_to_filename(download_path)

Categories