My use-case is to use a script to create/update a sheet on my google drive and have it run everyday so the data is correct.
My code properly creates the sheet, but when I run each day it creates a different sheet with the same name. I want to add a try, except to see if the sheet was previously, and if it is, just overwrite.
I've spent a couple of hours trying to find an example where someone did this. I'm looking to return the sheetid, whether it's newly created or previously created.
def create_spreadsheet(sp_name, creds):
proxy = None
#Connect to sheet API
sheets_service = build('sheets', 'v4', http=creds.authorize(httplib2.Http(proxy_info = proxy)))
#create spreadsheet with title 'sp_title'
sp_title = sp_name
spreadsheet_req_body = {
'properties': {
'title': sp_title
}
}
spreadsheet = sheets_service.spreadsheets().create(body=spreadsheet_req_body,
fields='spreadsheetId').execute()
return spreadsheet.get('spreadsheetId')
You want to check whether the file (Spreadsheet), which has the specific filename, is existing in your Google Drive.
If the file is existing, you want to return the file ID of it.
If the file is not existing, you want to return the file ID by creating new Spreadsheet.
You want to achieve above using google-api-python-client of Python.
If my understanding is correct, how about this modification? There is the method for confirming whether the file, which has the specific filename, is existing using Drive API. In this modification, the method of Files: list Drive API is used. Please think of this as just one of several answers.
Modification points:
In this modification, the method of Files: list Drive API is used. The file is checked with the search query.
In this case, the file is searched by the filename and the mimeType and out of the trash box.
When the file is existing, the file ID is return.
When the file is NOT existing, new Spreadsheet is created and return the file ID by your script.
Modified script:
Please modify your script as follows.
def create_spreadsheet(sp_name, creds):
proxy = None
sp_title = sp_name
# --- I added blow script.
drive_service = build('drive', 'v3', http=creds.authorize(httplib2.Http(proxy_info = proxy)))
q = "name='%s' and mimeType='application/vnd.google-apps.spreadsheet' and trashed=false" % sp_title
files = drive_service.files().list(q=q).execute()
f = files.get('files')
if f:
return f[0]['id']
# ---
sheets_service = build('sheets', 'v4', http=creds.authorize(httplib2.Http(proxy_info = proxy)))
sp_title = sp_name
spreadsheet_req_body = {
'properties': {
'title': sp_title
}
}
spreadsheet = sheets_service.spreadsheets().create(body=spreadsheet_req_body,
fields='spreadsheetId').execute()
return spreadsheet.get('spreadsheetId')
Note:
In this modification, I used https://www.googleapis.com/auth/drive.metadata.readonly as the scope. So please enable Drive API and add the scope and delete the file including the access token and refresh token, then please authorize the scopes by running the script, again. By this, the additional scope can be reflected to the access token. Please be careful this.
Reference:
Files: list of Drive API
If I misunderstood your question and this was not the direction you want, I apologize.
Related
I retrieved the comments of particular cell in my google spreadsheet using their API with the OAUTH_SCOPE = "https://www.googleapis.com/auth/drive" and version 3.
I get an output which is of this form:
{'kind': 'drive#comment', 'id': 'AAAAnggKMaA', 'createdTime': '2023-01-18T08:56:39.693Z', 'modifiedTime': '2023-01-18T09:03:32.426Z', 'author': {'kind': 'drive#user', 'displayName': 'Andrew Flint', 'photoLink': '//lh3.googleusercontent.com/a/AFBCDEDF3BjIhc6Hgtsb5kDdzVt54vIjG3q0W8d1CYi=s50-c-k-no', 'me': True}, 'htmlContent': 'No version specified in current.json', 'content': 'No version specified in current.json', 'deleted': False, 'resolved': False, 'anchor': '{"type":"workbook-range","uid":0,"range":"1713668520"}', 'replies': [{'kind': 'drive#reply', 'id': 'AAAAnggKMaE', 'createdTime': '2023-01-18T09:03:32.426Z', 'modifiedTime': '2023-01-18T09:03:32.426Z', 'author': {'kind': 'drive#user', 'displayName': 'Andrew Flint', 'photoLink': '//lh3.googleusercontent.com/a/ADDDGyFTp7mR3BjIhc6Hgtsb5kDdzVt54vIjG3q0W8d1CYi=s50-c-k-no', 'me': True}, 'htmlContent': 'Unable to find a package version URLfor Mono-Extended. Found\xa0 somewhat matching package details here :\xa0https://aur.archlinux.org/packages/nerd-fonts-noto-sans-mono-extended but not sure if this is the intended package', 'content': 'Unable to find a package version URLfor Mono-Extended. Found\xa0 somewhat matching package details here :\xa0https://aur.archlinux.org/packages/nerd-fonts-noto-sans-mono-extended but not sure if this is the intended package', 'deleted': False}]}
I now want to associate this comment with that particular row from which this comment was extracted through a python script; i.e. I want to be able to know the row index of the cell from which this comment was extracted or the indices of the anchor cell.
At the moment, there does not seem to be an obvious way to do that. But, I suspect the comment-id might be able to help. Google does not seem to give a way to do that in an obvious way.
Any inputs on this will be deeply appreciated! Thanks!
I believe your goal is as follows.
You want to retrieve the row index of the row with the comment.
You want to achieve this using python.
From your previous question, you are using googleapis for python.
Issue and workaround:
When the anchor cell information is retrieved from the comment ID, in your showing sample, it's 'anchor': '{"type":"workbook-range","uid":0,"range":"1713668520"}. But, in the current stage, unfortunately, the anchor cell cannot be known from it. Ref By this, I thought that your goal cannot be directly achieved by Sheets API and Drive API. I think that if the cell coordinate is retrieved from "range":"1713668520", your goal can be achieved.
From the above situation, I would like to propose a workaround. My workaround is as follows.
Download the Google Spreadsheet using Drive API as XLSX data.
Parse XLSX data using openpyxl.
Using openpyxl, the comments are retrieved from XLSX data converted from Google Spreadsheet.
When this flow is reflected in a python script, how about the following sample script?
Sample script 1:
In this case, please use your script of authorization. The access token is retrieved from it. And, please set your Spreadsheet ID.
service = build("drive", "v3", credentials=creds)
access_token = creds.token # or access_token = service._http.credentials.token
spreadsheetId = "###" # Please set the Spreadsheet ID.
sheetName = "Sheet1" # Please set your sheet name.
url = "https://www.googleapis.com/drive/v3/files/" + spreadsheetId + "/export?mimeType=application%2Fvnd.openxmlformats-officedocument.spreadsheetml.sheet"
res = requests.get(url, headers={"Authorization": "Bearer " + access_token})
workbook = openpyxl.load_workbook(filename=BytesIO(res.content), data_only=False)
worksheet = workbook[sheetName]
res = []
for i, row in enumerate(worksheet.iter_rows()):
for j, cell in enumerate(row):
if cell.comment:
res.append({"rowIndex": i, "columnIndex": j, "comment": cell.comment.text})
print(res)
In this script, please add the following libraries.
import openpyxl
import requests
from io import BytesIO
When this script is run, the Google Spreadsheet is exported in XLSX format, and the XLSX data is parsed and retrieved the comments. And, the row and column indexes and the comment text are returned as an array as follows. Unfortunately, the comment ID of Drive API cannot be retrieved from XLSX data. So, I included the comment text.
[
{'rowIndex': 0, 'columnIndex': 0, 'comment': 'sample comment'},
,
,
,
]
Sample script 2:
As a sample script 2, in this sample script, Google Spreadsheet is exported as XLSX format using googleapis for python.
service = build("drive", "v3", credentials=creds) # Please use your client.
spreadsheetId = "###" # Please set the Spreadsheet ID.
sheetName = "Sheet1" # Please set your sheet name.
request = service.files().export_media(fileId=spreadsheetId, mimeType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")
fh = BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%" % int(status.progress() * 100))
fh.seek(0)
workbook = openpyxl.load_workbook(filename=fh, data_only=False)
worksheet = workbook[sheetName]
res = []
for i, row in enumerate(worksheet.iter_rows()):
for j, cell in enumerate(row):
if cell.comment:
res.append({"rowIndex": i, "columnIndex": j, "comment": cell.comment.text})
print(res)
In this case, googeapis for python is used. So, requests is not used.
When this script is run, the same value with the above script is obtained.
Reference:
Files: export
Given a url of googlesheets like https://docs.google.com/spreadsheets/d/1dprQgvpy-qHNU5eHDoOUf9qXi6EqwBbsYPKHB_3c/edit#gid=1139845333
How could I use gspread api to get the name of the sheet?
I mean the name may be sheet1, sheet2, etc
Thanks!
I believe your goal is as follows.
You want to retrieve the sheet names from a Google Spreadsheet from the URL of https://docs.google.com/spreadsheets/d/###/edit#gid=1139845333.
From How could I use gspread api to get the name of the sheet?, you want to achieve this using gsperad for python.
In this case, how about the following sample script?
Sample script:
client = gspread.authorize(credentials)
url = "https://docs.google.com/spreadsheets/d/1dprQgvpy-qHNU5eHDoOUf9qXi6EqwBbsYPKHB_3c/edit#gid=1139845333"
spreadsheet = client.open_by_url(url)
sheet_names = [s.title for s in spreadsheet.worksheets()]
print(sheet_names)
In this script, please use your client = gspread.authorize(credentials).
When this script is run, the sheet names are returned as a list.
References:
open_by_url(url)
worksheets()
Added:
About your following new question,
May I know what if I only want the sheet name of a particular one? Usually, for each additional sheet we create, it comes with a series of number at the end (gid=1139845333), I just want the name for that sheet instead of all.
In this case, how about the following sample script?
Sample script:
client = gspread.authorize(credentials)
url = "https://docs.google.com/spreadsheets/d/1dprQgvpy-qHNU5eHDoOUf9qXi6EqwBbsYPKHB_3c/edit#gid=1139845333"
gid = "1139845333"
sheet_name = [s.title for s in spreadsheet.worksheets() if str(s.id) == gid]
if len(sheet_name) == 1:
print(sheet_name)
else:
print("No sheet of the GID " + gid)
I don't seem to find many articles on this. The tutorial from Google only shows creating folders in a regular Google Drive folder.
Below is my function and it fails with oogleapiclient.errors.HttpError: <HttpError 404 when requesting https://www.googleapis.com/drive/v3/files?alt=json returned "File not found: myteamdrivefolderid.". Details: "File not found: myteamdrivefolderid.">
My app is a Python desktop app and has already been authorized to have full Drive read/write scope.
def create_folder(service, name, parent_id=None, **kwargs):
# Create a folder on Drive, returns the newely created folders ID
body = {
'name': name,
'mimeType': "application/vnd.google-apps.folder"
}
if parent_id:
body['parents'] = [parent_id]
if 'teamDriveId' in kwargs:
body['teamDriveId'] = kwargs['teamDriveId']
folder = service.files().create(body=body).execute()
return folder['id']
I believe your goal and situation as follows.
You want to create new folder in the shared Drive using googleapis for python.
You have the permission for creating new folder in the shared Drive.
You have already been able to use Drive API.
Modification points:
In this case, please add supportsAllDrives to the query parameter.
It seems that teamDriveId is deprecated. Please use driveId. Ref
When above points are reflected to your script, it becomes as follows.
Modified script:
body = {
'name': name,
'mimeType': "application/vnd.google-apps.folder"
}
if parent_id:
body['parents'] = [parent_id]
if 'teamDriveId' in kwargs:
body['driveId'] = kwargs['teamDriveId'] # Modified
folder = service.files().create(body=body, supportsAllDrives=True).execute() # Modified
return folder['id']
Reference:
Files: create
I am trying to retrieve all files in Google Drive, but only those in 'My Drive'. I tried including "'me' in owners" in the query, but that gives me tons of files in shared folders where I am the owner. I tried "'root' in parents" in the query, but that gives me back only files directly under My Drive, while I need also files under subfolders and subolders of those subolders, etc.
I tried also setting the drive parameter but in this case the query does not retrieve anything at all:
driveid = service.files().get(fileId='root').execute()['id']
page_token = None
my_files = list()
while True:
results = service.files().list(q= "'myemail#gmail.com' in owners",
pageSize=10,
orderBy='modifiedTime',
pageToken=page_token,
spaces = 'drive',
corpora='drive',
driveId = driveid,
includeItemsFromAllDrives=True,
supportsAllDrives=True,
fields="nextPageToken, files(id, name)").execute()
items = results.get('files', [])
my_files.extend(items)
page_token = results.get('nextPageToken', None)
if page_token is None:
break
print(len(my_files))
# This prints: 0
How can I get this to work?
I guess the other possibility would be to start from root, get children and recursively navigate the full tree, but that is going to be very slow. The same applies if I get all the files and then find out all the parents to check if they are in My Drive or not, I have too many files and that takes hours.
Thanks in advance!
The first request you make would be to parents in root. This is the top level of your drive account.
results = service.files().list(q= "root in parents").execute()
Now you will need to loop though the results here in your code. Check for mime type being a directory 'application/vnd.google-apps.folder' Everything that is not a directory should be a file sitting in the root directory of your Google drive account.
Now all those directories that you found what you can do is make a new request to find out the files in those directories
results = service.files().list(q= "directorIDFromLastRequest in parents").execute()
You can then loop though getting all of the files in each of the directories. Looks like its a known bug Drive.Files.list query throws error when using "sharedWithMe = false"
shared with me
You can also set SharedWithMe = false in the q parameter and this should remove all of the files that have been shared with you. Causing it to only return the files that are actually yours.
This used to work but i am currently having issues with it while i am testing.
Speed.
The thing is as mentioned files.list will by default just return everything but in no order so technically you could just do a file.list and add the sharedwithme and get back all the files and directories on your drive account. By requesting pagesize of 1000 you will then have fewer requests. Then sort it all locally on your machine once its down.
The other option would be to do as i have written above and grab each directory in turn. This will probably result in more requests.
Possible fix here using google drive API v3 with python 3.7+
use the following syntax:
q="mimeType='application/vnd.google-apps.folder' and trashed = false and 'me' in owners"
This query passed into service.files().list method should get you what you need. A list of all folders owned by you which is the best workaround I could find. " 'me' in owners" is the key here.
Full snippet here:
response = service.files().list(q="mimeType='application/vnd.google-apps.folder' and trashed = false and 'me' in owners",
spaces='drive',
fields='nextPageToken, files(id, name)',
pageToken=page_token).execute()
for file in response.get('files', []):
# Process change
print ('Found file: %s (%s)' % (file.get('name'), file.get('id')))
From this example. Can I use MediafileUpload with creating folder? How can I get the parent_id from?
From https://developers.google.com/drive/folder
I just know that i should use mime = "application/vnd.google-apps.folder" but how do I implement this tutorial to programming in Python?
Thank you for your suggestions.
To create a folder on Drive, try:
def createRemoteFolder(self, folderName, parentID = None):
# Create a folder on Drive, returns the newely created folders ID
body = {
'title': folderName,
'mimeType': "application/vnd.google-apps.folder"
}
if parentID:
body['parents'] = [{'id': parentID}]
root_folder = drive_service.files().insert(body = body).execute()
return root_folder['id']
You only need a parent ID here if you want to create folder within another folder, otherwise just don't pass any value for that.
If you want the parent ID, you'll need to write a method to search Drive for folders with that parent name in that location (do a list() call) and then get the ID of that folder.
Edit: Note that v3 of the API uses a list for the 'parents' field, instead of a dictionary. Also, the 'title' field changed to 'name', and the insert() method changed to create(). The code from above would change to the following for v3:
def createRemoteFolder(self, folderName, parentID = None):
# Create a folder on Drive, returns the newely created folders ID
body = {
'name': folderName,
'mimeType': "application/vnd.google-apps.folder"
}
if parentID:
body['parents'] = [parentID]
root_folder = drive_service.files().create(body = body).execute()
return root_folder['id']
The mediafile uplaod is needed only if you want to insert content. Since you want only to insert metadata (folders are only metadata), you don't need it. A regular POST with the JSON representing the foder is enough.
You can get the parent ID in several ways :
searching (file.list end point)
inserting folder : this returns you a JSON representing the inserted folder, containing its ID
getting it yourself via the web UI (the ID is contained in the URL of your folder or file) : go to the Web UI, select the folder or file you want, then you can identify the fileId in the URL. ex : https://drive.google.com/#folders/0B8VrsrGIcVbrRDVxMXFWVkdfejQ
The file Id is the last part of the URL, ie. 0B8VrsrGIcVbrRDVxMXFWVkdfejQ
How to get an FileID programatically :
Use the children.list endpoint using a known fileId to get the ids of the children of this known ID.
Use the search feature of google drive : files.list endpoint with a q parameter
Use aliases : the only one I know in Google Drive is root for the root folder of your Drive.
Using 3. and 1., you can get all the fileIds of your Drive.
I dont know how I can be clearer
def create_folder(header, folder_name, folder_id, drive_id=None):
url = 'https://www.googleapis.com/upload/drive/v3/files'
file_metadata = {
'name': folder_name,
'mimeType': 'application/vnd.google-apps.folder',
'parents': [folder_id]
}
file_withmetadata = {"data": ("metadata", json.dumps(file_metadata), "application/json; charset=UTF-8")}
param = {"q":"'%s' in parents" %folder_id, "supportsAllDrives":"true"}
if drive_id is not None:
param['driveId'] = drive_id
r = requests.post(
url,
headers=header,
params=param,
files=file_withmetadata
)
return json.loads(r.text)